fix: Avoid signed overflow UB in BSI index and null-deref in bloom filter index#340
fix: Avoid signed overflow UB in BSI index and null-deref in bloom filter index#340lxy-9602 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses two undefined-behavior issues in file index readers: signed overflow when negating INT64_MIN in the BSI reader, and hashing a null literal in the Bloom filter reader.
Changes:
- Added a
SafeAbs(int64_t)helper and replaced-valuenegation in BSI negative-branch comparisons to avoid signed-overflow UB. - Updated Bloom filter
VisitEqualto return early for null literals before hashing, preventing null dereference. - Adjusted Bloom filter equality logic to remove the redundant post-hash null check.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/paimon/common/file_index/bsi/bit_slice_index_bitmap_file_index.cpp | Adds SafeAbs and switches negative-branch BSI comparisons away from -value to avoid UB. |
| src/paimon/common/file_index/bloomfilter/bloom_filter_file_index.cpp | Prevents hashing null literals by early-returning Remain() when literal.IsNull(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Safe absolute value for int64_t that avoids undefined behavior when value == INT64_MIN. | ||
| // This mirrors Java's Math.abs() wrapping semantics but produces the correct magnitude. | ||
| inline int64_t SafeAbs(int64_t value) { | ||
| if (value == INT64_MIN) { | ||
| return INT64_MIN; | ||
| } | ||
| return value < 0 ? -value : value; | ||
| } |
| } else { | ||
| PAIMON_ASSIGN_OR_RAISE(RoaringBitmap32 b1, reader->negative_->LessThan(-value)); | ||
| PAIMON_ASSIGN_OR_RAISE(RoaringBitmap32 b1, reader->negative_->LessThan(SafeAbs(value))); | ||
| RoaringBitmap32 b2 = reader->positive_->IsNotNull(); | ||
| b1 |= b2; | ||
| return b1; |
| } else { | ||
| PAIMON_ASSIGN_OR_RAISE(RoaringBitmap32 b1, reader->negative_->LessOrEqual(-value)); | ||
| PAIMON_ASSIGN_OR_RAISE(RoaringBitmap32 b1, | ||
| reader->negative_->LessOrEqual(SafeAbs(value))); | ||
| RoaringBitmap32 b2 = reader->positive_->IsNotNull(); | ||
| b1 |= b2; | ||
| return b1; |
| PAIMON_ASSIGN_OR_RAISE(int64_t value, reader->value_mapper_(literal)); | ||
| if (value < 0) { | ||
| return reader->negative_->GreaterThan(-value); | ||
| return reader->negative_->GreaterThan(SafeAbs(value)); | ||
| } else { |
| PAIMON_ASSIGN_OR_RAISE(int64_t value, reader->value_mapper_(literal)); | ||
| if (value < 0) { | ||
| return reader->negative_->GreaterOrEqual(-value); | ||
| return reader->negative_->GreaterOrEqual(SafeAbs(value)); | ||
| } else { |
|
The This affects predicate pruning when the query literal is
I checked Java Paimon as well. Java’s BSI writer cannot successfully write |
Purpose
No Linked issue.
Fix two undefined-behavior bugs in file index readers:
BSI reader signed overflow: negative branches computed
-value, which isUB when value is
INT64_MIN. Now use aSafeAbs()helper that clampsINT64_MINtoINT64_MIN.Bloom filter null deref:
VisitEqualhashed the literal before checkingIsNull(), dereferencing a null value. Now returnsRemain()for null literals before hashing.
Cross-checked with Java
BitSliceIndexBitmapFileIndex: Java'sMath.abswrapsLong.MIN_VALUEback to negative, so its writer rejectsMIN_VALUEoutright —the value can never reach a BSI index. The C++ fix is defensive and matches Java
results for all valid values.
Tests
API and Format
Documentation
Generative AI tooling
Generated-by: Aone Copilot (Claude)