Skip to content

perf(index): fix O(N log N) warm-cache regression in BitmapIndex#7079

Merged
wjones127 merged 4 commits into
lance-format:mainfrom
westonpace:perf-bitmap-serialization-regression
Jun 3, 2026
Merged

perf(index): fix O(N log N) warm-cache regression in BitmapIndex#7079
wjones127 merged 4 commits into
lance-format:mainfrom
westonpace:perf-bitmap-serialization-regression

Conversation

@westonpace

Copy link
Copy Markdown
Member

Problem

Commit 4de5ce6 ("feat(index): serializable cache for Bitmap and LabelList scalar indices #6874") introduced a performance regression in BitmapIndexPlugin::get_from_cache. Every warm-cache hit against a bitmap scalar index now pays O(N log N) cost where N is the number of unique values in the column, instead of O(1).

The regression: the new implementation stored only the serializable BitmapIndexState (an Arrow RecordBatch) in the cache and reconstructed the full BTreeMap<OrderableScalarValue, usize> on every cache hit by calling parse_lookup_batch. For a column with 10M unique values this rebuilds the map on every query — including IS NULL, whose actual bitmap lookup is (*self.null_map).clone() and is otherwise O(1).

parse_lookup_batch is expensive because:

  1. It calls ScalarValue::try_from_array for every row — one heap allocation per unique value.
  2. It inserts into a BTreeMap — O(log N) comparisons per insert, O(N log N) total.

Fix

BitmapIndex.index_map: Changed from BTreeMap<OrderableScalarValue, usize> to Arc<BTreeMap<OrderableScalarValue, usize>>. The map is immutable after construction, so sharing it behind an Arc is safe, and cloning is O(1).

BitmapIndexState: Added an index_map: Arc<BTreeMap<...>> field that is not serialized — the wire format is unchanged. It is populated eagerly:

  • from_index (called by put_in_cache): Arc::clones the map from the live BitmapIndex — O(1).
  • deserialize (disk-backed cache backends): calls parse_lookup_batch once at deserialization time, which is already paying disk I/O cost.

into_bitmap_index: Now takes &self and simply Arc::clones self.index_map — always O(1), no reconstruction.

get_from_cache: The intermediate (*state).clone() is removed since into_bitmap_index no longer consumes self.

LabelListIndex had the same dual-entry patch applied in a prior iteration; that is also reverted to the original single-entry approach (its BitmapIndexState path is unchanged by this PR).

Test

Added test_bitmap_cache_fast_path to bitmap.rs:

  • Creates a high-cardinality bitmap index (1 000 unique integers + 5 null rows)
  • Calls put_in_cache, then get_from_cache
  • Asserts get_from_cache returns Some
  • Runs IS NULL and asserts the correct 5 null rows are returned

To measure the end-to-end impact, run the bitmap / is_null / warm case in python/python/ci_benchmarks/benchmarks/test_count_rows.py — latency should be close to btree / is_null / warm.

westonpace and others added 3 commits June 2, 2026 16:53
…tIndex

Commit 4de5ce6 replaced the default trait impl (which stored the
pre-built Arc<dyn ScalarIndex>) with a serializable BitmapIndexState
path to support disk-backed caches.  That removed the fast in-memory
path entirely: every warm cache hit now reconstructs the full BTreeMap
via parse_lookup_batch — O(N log N) in the number of unique values.
For IS NULL the actual query work is O(1), so the rebuild is 100% overhead.

Fix: write both entries in put_in_cache — the unsized Arc<dyn ScalarIndex>
(fast path) and the serializable BitmapIndexState (disk-backed path).
get_from_cache checks the unsized entry first; on a miss it reconstructs
from the serialized state and back-fills the fast entry for subsequent hits.

Applies the same fix to LabelListIndexPlugin.

Adds test_bitmap_cache_fast_path to verify put_in_cache populates the
unsized entry and that get_from_cache returns it without reconstructing
the BTreeMap, including a correct IS NULL result.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…og N) warm rebuilds

Supersedes the dual-entry approach in the previous commit with a cleaner
single-entry design.

Root cause: BitmapIndexState::into_bitmap_index called parse_lookup_batch
on every warm cache hit, rebuilding the BTreeMap<OrderableScalarValue, usize>
from the stored RecordBatch — O(N log N) in the number of unique values,
with N heap allocations for ScalarValue keys. For IS NULL this was 100%
overhead since the query itself is O(1).

Fix:
- Change BitmapIndex.index_map from BTreeMap to Arc<BTreeMap> so the map
  can be shared across BitmapIndex instances without cloning.
- Add index_map: OnceLock<Arc<BTreeMap<...>>> to BitmapIndexState. Not
  serialized — from_index pre-populates it from the live index; deserialize
  leaves it empty for the disk-backed path.
- into_bitmap_index now takes &self: on a warm hit it Arc::clones the cached
  map (O(1)); on a disk-backed miss it parses once, caches in the OnceLock,
  and subsequent calls are O(1). The get_from_cache clone of BitmapIndexState
  is no longer needed.
- Revert the ScalarIndexCacheKey dual-entry approach from the previous commit
  in both bitmap.rs and label_list.rs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…of OnceLock

The OnceLock was unnecessary complexity. deserialize already pays the disk
I/O cost, so parsing the lookup_batch into an Arc<BTreeMap> eagerly there
is free relative to that. from_index just Arc::clones the existing map.
into_bitmap_index then Arc::clones on every call — always O(1), no lazy
init, no interior mutability, no manual Clone impl needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added A-index Vector index, linalg, tokenizer performance labels Jun 3, 2026
@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.00000% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/bitmap.rs 91.30% 4 Missing and 2 partials ⚠️
rust/lance-index/src/scalar/label_list.rs 0.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@westonpace

Copy link
Copy Markdown
Member Author

The benchmark that detected this regression went from ~3s -> 1.7ms.

- Rename BitmapIndexState::into_bitmap_index -> to_bitmap_index; clippy
  requires into_* methods to take self by value
- Replace repeat(None).take(N) with repeat_n(None, N) in test to satisfy
  clippy::manual_repeat_n and fix the rustfmt line-length violation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@wjones127 wjones127 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a clean fix. Thank you!

@wjones127 wjones127 merged commit 549ce37 into lance-format:main Jun 3, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants