perf(index): fix O(N log N) warm-cache regression in BitmapIndex#7079
Merged
wjones127 merged 4 commits intoJun 3, 2026
Merged
Conversation
…tIndex Commit 4de5ce6 replaced the default trait impl (which stored the pre-built Arc<dyn ScalarIndex>) with a serializable BitmapIndexState path to support disk-backed caches. That removed the fast in-memory path entirely: every warm cache hit now reconstructs the full BTreeMap via parse_lookup_batch — O(N log N) in the number of unique values. For IS NULL the actual query work is O(1), so the rebuild is 100% overhead. Fix: write both entries in put_in_cache — the unsized Arc<dyn ScalarIndex> (fast path) and the serializable BitmapIndexState (disk-backed path). get_from_cache checks the unsized entry first; on a miss it reconstructs from the serialized state and back-fills the fast entry for subsequent hits. Applies the same fix to LabelListIndexPlugin. Adds test_bitmap_cache_fast_path to verify put_in_cache populates the unsized entry and that get_from_cache returns it without reconstructing the BTreeMap, including a correct IS NULL result. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…og N) warm rebuilds Supersedes the dual-entry approach in the previous commit with a cleaner single-entry design. Root cause: BitmapIndexState::into_bitmap_index called parse_lookup_batch on every warm cache hit, rebuilding the BTreeMap<OrderableScalarValue, usize> from the stored RecordBatch — O(N log N) in the number of unique values, with N heap allocations for ScalarValue keys. For IS NULL this was 100% overhead since the query itself is O(1). Fix: - Change BitmapIndex.index_map from BTreeMap to Arc<BTreeMap> so the map can be shared across BitmapIndex instances without cloning. - Add index_map: OnceLock<Arc<BTreeMap<...>>> to BitmapIndexState. Not serialized — from_index pre-populates it from the live index; deserialize leaves it empty for the disk-backed path. - into_bitmap_index now takes &self: on a warm hit it Arc::clones the cached map (O(1)); on a disk-backed miss it parses once, caches in the OnceLock, and subsequent calls are O(1). The get_from_cache clone of BitmapIndexState is no longer needed. - Revert the ScalarIndexCacheKey dual-entry approach from the previous commit in both bitmap.rs and label_list.rs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…of OnceLock The OnceLock was unnecessary complexity. deserialize already pays the disk I/O cost, so parsing the lookup_batch into an Arc<BTreeMap> eagerly there is free relative to that. from_index just Arc::clones the existing map. into_bitmap_index then Arc::clones on every call — always O(1), no lazy init, no interior mutability, no manual Clone impl needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Member
Author
|
The benchmark that detected this regression went from ~3s -> 1.7ms. |
- Rename BitmapIndexState::into_bitmap_index -> to_bitmap_index; clippy requires into_* methods to take self by value - Replace repeat(None).take(N) with repeat_n(None, N) in test to satisfy clippy::manual_repeat_n and fix the rustfmt line-length violation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Xuanwo
approved these changes
Jun 3, 2026
wjones127
approved these changes
Jun 3, 2026
wjones127
left a comment
Contributor
There was a problem hiding this comment.
Looks like a clean fix. Thank you!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Commit 4de5ce6 ("feat(index): serializable cache for Bitmap and LabelList scalar indices #6874") introduced a performance regression in
BitmapIndexPlugin::get_from_cache. Every warm-cache hit against a bitmap scalar index now pays O(N log N) cost where N is the number of unique values in the column, instead of O(1).The regression: the new implementation stored only the serializable
BitmapIndexState(an ArrowRecordBatch) in the cache and reconstructed the fullBTreeMap<OrderableScalarValue, usize>on every cache hit by callingparse_lookup_batch. For a column with 10M unique values this rebuilds the map on every query — includingIS NULL, whose actual bitmap lookup is(*self.null_map).clone()and is otherwise O(1).parse_lookup_batchis expensive because:ScalarValue::try_from_arrayfor every row — one heap allocation per unique value.BTreeMap— O(log N) comparisons per insert, O(N log N) total.Fix
BitmapIndex.index_map: Changed fromBTreeMap<OrderableScalarValue, usize>toArc<BTreeMap<OrderableScalarValue, usize>>. The map is immutable after construction, so sharing it behind anArcis safe, and cloning is O(1).BitmapIndexState: Added anindex_map: Arc<BTreeMap<...>>field that is not serialized — the wire format is unchanged. It is populated eagerly:from_index(called byput_in_cache):Arc::clones the map from the liveBitmapIndex— O(1).deserialize(disk-backed cache backends): callsparse_lookup_batchonce at deserialization time, which is already paying disk I/O cost.into_bitmap_index: Now takes&selfand simplyArc::clonesself.index_map— always O(1), no reconstruction.get_from_cache: The intermediate(*state).clone()is removed sinceinto_bitmap_indexno longer consumesself.LabelListIndexhad the same dual-entry patch applied in a prior iteration; that is also reverted to the original single-entry approach (itsBitmapIndexStatepath is unchanged by this PR).Test
Added
test_bitmap_cache_fast_pathtobitmap.rs:put_in_cache, thenget_from_cacheget_from_cachereturnsSomeIS NULLand asserts the correct 5 null rows are returnedTo measure the end-to-end impact, run the
bitmap / is_null / warmcase inpython/python/ci_benchmarks/benchmarks/test_count_rows.py— latency should be close tobtree / is_null / warm.