Skip to content

fix: Avoid signed overflow UB in BSI index and null-deref in bloom filter index#340

Open
lxy-9602 wants to merge 1 commit into
alibaba:mainfrom
lxy-9602:fix-bsi-ub
Open

fix: Avoid signed overflow UB in BSI index and null-deref in bloom filter index#340
lxy-9602 wants to merge 1 commit into
alibaba:mainfrom
lxy-9602:fix-bsi-ub

Conversation

@lxy-9602
Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 commented Jun 5, 2026

Purpose

No Linked issue.

Fix two undefined-behavior bugs in file index readers:

  1. BSI reader signed overflow: negative branches computed -value, which is
    UB when value is INT64_MIN. Now use a SafeAbs() helper that clamps
    INT64_MIN to INT64_MIN.

  2. Bloom filter null deref: VisitEqual hashed the literal before checking
    IsNull(), dereferencing a null value. Now returns Remain()
    for null literals before hashing.

Cross-checked with Java BitSliceIndexBitmapFileIndex: Java's Math.abs wraps
Long.MIN_VALUE back to negative, so its writer rejects MIN_VALUE outright —
the value can never reach a BSI index. The C++ fix is defensive and matches Java
results for all valid values.

Tests

API and Format

Documentation

Generative AI tooling

Generated-by: Aone Copilot (Claude)

@lxy-9602 lxy-9602 changed the title fix: Avoid signed overflow UB in BSI index and null-deref in bloom fi… fix: Avoid signed overflow UB in BSI index and null-deref in bloom filter index Jun 5, 2026
@zjw1111 zjw1111 requested a review from Copilot June 5, 2026 07:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses two undefined-behavior issues in file index readers: signed overflow when negating INT64_MIN in the BSI reader, and hashing a null literal in the Bloom filter reader.

Changes:

  • Added a SafeAbs(int64_t) helper and replaced -value negation in BSI negative-branch comparisons to avoid signed-overflow UB.
  • Updated Bloom filter VisitEqual to return early for null literals before hashing, preventing null dereference.
  • Adjusted Bloom filter equality logic to remove the redundant post-hash null check.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
src/paimon/common/file_index/bsi/bit_slice_index_bitmap_file_index.cpp Adds SafeAbs and switches negative-branch BSI comparisons away from -value to avoid UB.
src/paimon/common/file_index/bloomfilter/bloom_filter_file_index.cpp Prevents hashing null literals by early-returning Remain() when literal.IsNull().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +37 to +44
// Safe absolute value for int64_t that avoids undefined behavior when value == INT64_MIN.
// This mirrors Java's Math.abs() wrapping semantics but produces the correct magnitude.
inline int64_t SafeAbs(int64_t value) {
if (value == INT64_MIN) {
return INT64_MIN;
}
return value < 0 ? -value : value;
}
Comment on lines 170 to 174
} else {
PAIMON_ASSIGN_OR_RAISE(RoaringBitmap32 b1, reader->negative_->LessThan(-value));
PAIMON_ASSIGN_OR_RAISE(RoaringBitmap32 b1, reader->negative_->LessThan(SafeAbs(value)));
RoaringBitmap32 b2 = reader->positive_->IsNotNull();
b1 |= b2;
return b1;
Comment on lines 187 to 192
} else {
PAIMON_ASSIGN_OR_RAISE(RoaringBitmap32 b1, reader->negative_->LessOrEqual(-value));
PAIMON_ASSIGN_OR_RAISE(RoaringBitmap32 b1,
reader->negative_->LessOrEqual(SafeAbs(value)));
RoaringBitmap32 b2 = reader->positive_->IsNotNull();
b1 |= b2;
return b1;
Comment on lines 202 to 205
PAIMON_ASSIGN_OR_RAISE(int64_t value, reader->value_mapper_(literal));
if (value < 0) {
return reader->negative_->GreaterThan(-value);
return reader->negative_->GreaterThan(SafeAbs(value));
} else {
Comment on lines 218 to 221
PAIMON_ASSIGN_OR_RAISE(int64_t value, reader->value_mapper_(literal));
if (value < 0) {
return reader->negative_->GreaterOrEqual(-value);
return reader->negative_->GreaterOrEqual(SafeAbs(value));
} else {
@lxy-9602
Copy link
Copy Markdown
Collaborator Author

lxy-9602 commented Jun 5, 2026

The SafeAbs(INT64_MIN) change avoids C++ signed-overflow UB, but it still passes a negative value into the negative-side BitSliceIndexRoaringBitmap. That bitmap represents absolute values and expects non-negative literals, so INT64_MIN needs explicit handling instead of being returned from SafeAbs.

This affects predicate pruning when the query literal is INT64_MIN:

  • x > INT64_MIN should keep all non-null rows, but the current code can drop negative rows.
  • x >= INT64_MIN should keep all non-null rows, but the current code can drop negative rows.
  • x < INT64_MIN should return an empty result for int64 values, but the current code can keep negative rows.
  • x <= INT64_MIN should also be empty if the index never contains INT64_MIN, but the current code can keep negative rows.

I checked Java Paimon as well. Java’s BSI writer cannot successfully write Long.MIN_VALUE, because Math.abs(Long.MIN_VALUE) remains negative and BitSliceIndexRoaringBitmap.Appender rejects negative min. However, Java’s reader also does not special-case a query literal of Long.MIN_VALUE, so this appears to be a Java-side boundary bug too rather than behavior that C++ should copy exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants