feat(index): serializable cache for Bitmap and LabelList scalar indices#6874
Merged
wjones127 merged 1 commit intoMay 20, 2026
Merged
Conversation
Adds `CacheCodec` impls so Bitmap and LabelList index cache entries survive through a persistent cache backend, mirroring the BTree work in lance-format#6793. - `CacheCodecImpl for RowAddrTreeMap` (delegates to existing `serialize_into`/`deserialize_from`), so per-value bitmap entries cached under `BitmapKey` are codec-backed. - `BitmapIndexState` captures the value→offset map (Arrow IPC), the null bitmap, and the value type. `BitmapIndexPlugin` overrides `get_from_cache`/`put_in_cache` to store this sized state. - `LabelListIndexState` wraps an inner `BitmapIndexState` plus `list_nulls` and gets the same plugin-level codec treatment. - `open_scalar_index` skips the LabelList compatibility check on cache hits, so a fully-cached LabelList query no longer pays an extra `bitmap_page_lookup.lance` open per call. Tests: - Unit codec round-trip for `BitmapIndexState` (empty + populated). - Integration tests `test_{bitmap,label_list}_prewarm_with_serializing_backend_serves_query_with_no_io` asserting zero IOPS after prewarm through a serializing cache backend. Closes lance-format#6744
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
LuQQiu
approved these changes
May 20, 2026
wombatu-kun
pushed a commit
to wombatu-kun/lance
that referenced
this pull request
May 21, 2026
Mainline added serializable scalar-index caching (lance-format#6793, lance-format#6874) and moved the TRACE_IO_EVENTS / record_index_load instrumentation from the outer call site into `scalar::open_scalar_index`. The relocated trace references a `uuid_str` local that no longer exists after the branch dropped the `&str` form, and the inner `index` binding is shadowed by the loaded plugin index. Capture `index.uuid` (a `Uuid`) before the shadowing and format it via Display. Also re-add the `UnsizedCacheKey` import in `rust/lance/src/index.rs`; the new `ScalarIndexCacheKey` introduced by this branch implements it, but the import was lost when the auto-merge pruned the outer scalar-cache code that mainline migrated into the plugin layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wombatu-kun
pushed a commit
to wombatu-kun/lance
that referenced
this pull request
May 24, 2026
Mainline added serializable scalar-index caching (lance-format#6793, lance-format#6874) and moved the TRACE_IO_EVENTS / record_index_load instrumentation from the outer call site into `scalar::open_scalar_index`. The relocated trace references a `uuid_str` local that no longer exists after the branch dropped the `&str` form, and the inner `index` binding is shadowed by the loaded plugin index. Capture `index.uuid` (a `Uuid`) before the shadowing and format it via Display. Also re-add the `UnsizedCacheKey` import in `rust/lance/src/index.rs`; the new `ScalarIndexCacheKey` introduced by this branch implements it, but the import was lost when the auto-merge pruned the outer scalar-cache code that mainline migrated into the plugin layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wombatu-kun
pushed a commit
to wombatu-kun/lance
that referenced
this pull request
May 25, 2026
Mainline added serializable scalar-index caching (lance-format#6793, lance-format#6874) and moved the TRACE_IO_EVENTS / record_index_load instrumentation from the outer call site into `scalar::open_scalar_index`. The relocated trace references a `uuid_str` local that no longer exists after the branch dropped the `&str` form, and the inner `index` binding is shadowed by the loaded plugin index. Capture `index.uuid` (a `Uuid`) before the shadowing and format it via Display. Also re-add the `UnsizedCacheKey` import in `rust/lance/src/index.rs`; the new `ScalarIndexCacheKey` introduced by this branch implements it, but the import was lost when the auto-merge pruned the outer scalar-cache code that mainline migrated into the plugin layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wjones127
pushed a commit
that referenced
this pull request
Jun 3, 2026
## Problem Commit 4de5ce6 ("feat(index): serializable cache for Bitmap and LabelList scalar indices #6874") introduced a performance regression in `BitmapIndexPlugin::get_from_cache`. Every warm-cache hit against a bitmap scalar index now pays O(N log N) cost where N is the number of unique values in the column, instead of O(1). The regression: the new implementation stored only the serializable `BitmapIndexState` (an Arrow `RecordBatch`) in the cache and reconstructed the full `BTreeMap<OrderableScalarValue, usize>` on every cache hit by calling `parse_lookup_batch`. For a column with 10M unique values this rebuilds the map on every query — including `IS NULL`, whose actual bitmap lookup is `(*self.null_map).clone()` and is otherwise O(1). `parse_lookup_batch` is expensive because: 1. It calls `ScalarValue::try_from_array` for every row — one heap allocation per unique value. 2. It inserts into a `BTreeMap` — O(log N) comparisons per insert, O(N log N) total. ## Fix **`BitmapIndex.index_map`**: Changed from `BTreeMap<OrderableScalarValue, usize>` to `Arc<BTreeMap<OrderableScalarValue, usize>>`. The map is immutable after construction, so sharing it behind an `Arc` is safe, and cloning is O(1). **`BitmapIndexState`**: Added an `index_map: Arc<BTreeMap<...>>` field that is **not serialized** — the wire format is unchanged. It is populated eagerly: - `from_index` (called by `put_in_cache`): `Arc::clone`s the map from the live `BitmapIndex` — O(1). - `deserialize` (disk-backed cache backends): calls `parse_lookup_batch` once at deserialization time, which is already paying disk I/O cost. **`into_bitmap_index`**: Now takes `&self` and simply `Arc::clone`s `self.index_map` — always O(1), no reconstruction. **`get_from_cache`**: The intermediate `(*state).clone()` is removed since `into_bitmap_index` no longer consumes `self`. `LabelListIndex` had the same dual-entry patch applied in a prior iteration; that is also reverted to the original single-entry approach (its `BitmapIndexState` path is unchanged by this PR). ## Test Added `test_bitmap_cache_fast_path` to `bitmap.rs`: - Creates a high-cardinality bitmap index (1 000 unique integers + 5 null rows) - Calls `put_in_cache`, then `get_from_cache` - Asserts `get_from_cache` returns `Some` - Runs `IS NULL` and asserts the correct 5 null rows are returned To measure the end-to-end impact, run the `bitmap / is_null / warm` case in `python/python/ci_benchmarks/benchmarks/test_count_rows.py` — latency should be close to `btree / is_null / warm`. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
wombatu-kun
pushed a commit
to wombatu-kun/lance
that referenced
this pull request
Jun 4, 2026
Mainline added serializable scalar-index caching (lance-format#6793, lance-format#6874) and moved the TRACE_IO_EVENTS / record_index_load instrumentation from the outer call site into `scalar::open_scalar_index`. The relocated trace references a `uuid_str` local that no longer exists after the branch dropped the `&str` form, and the inner `index` binding is shadowed by the loaded plugin index. Capture `index.uuid` (a `Uuid`) before the shadowing and format it via Display. Also re-add the `UnsizedCacheKey` import in `rust/lance/src/index.rs`; the new `ScalarIndexCacheKey` introduced by this branch implements it, but the import was lost when the auto-merge pruned the outer scalar-cache code that mainline migrated into the plugin layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wombatu-kun
pushed a commit
to wombatu-kun/lance
that referenced
this pull request
Jun 4, 2026
Mainline added serializable scalar-index caching (lance-format#6793, lance-format#6874) and moved the TRACE_IO_EVENTS / record_index_load instrumentation from the outer call site into `scalar::open_scalar_index`. The relocated trace references a `uuid_str` local that no longer exists after the branch dropped the `&str` form, and the inner `index` binding is shadowed by the loaded plugin index. Capture `index.uuid` (a `Uuid`) before the shadowing and format it via Display. Also re-add the `UnsizedCacheKey` import in `rust/lance/src/index.rs`; the new `ScalarIndexCacheKey` introduced by this branch implements it, but the import was lost when the auto-merge pruned the outer scalar-cache code that mainline migrated into the plugin layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wombatu-kun
pushed a commit
to wombatu-kun/lance
that referenced
this pull request
Jun 4, 2026
Mainline added serializable scalar-index caching (lance-format#6793, lance-format#6874) and moved the TRACE_IO_EVENTS / record_index_load instrumentation from the outer call site into `scalar::open_scalar_index`. The relocated trace references a `uuid_str` local that no longer exists after the branch dropped the `&str` form, and the inner `index` binding is shadowed by the loaded plugin index. Capture `index.uuid` (a `Uuid`) before the shadowing and format it via Display. Also re-add the `UnsizedCacheKey` import in `rust/lance/src/index.rs`; the new `ScalarIndexCacheKey` introduced by this branch implements it, but the import was lost when the auto-merge pruned the outer scalar-cache code that mainline migrated into the plugin layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wombatu-kun
pushed a commit
to wombatu-kun/lance
that referenced
this pull request
Jun 5, 2026
Mainline added serializable scalar-index caching (lance-format#6793, lance-format#6874) and moved the TRACE_IO_EVENTS / record_index_load instrumentation from the outer call site into `scalar::open_scalar_index`. The relocated trace references a `uuid_str` local that no longer exists after the branch dropped the `&str` form, and the inner `index` binding is shadowed by the loaded plugin index. Capture `index.uuid` (a `Uuid`) before the shadowing and format it via Display. Also re-add the `UnsizedCacheKey` import in `rust/lance/src/index.rs`; the new `ScalarIndexCacheKey` introduced by this branch implements it, but the import was lost when the auto-merge pruned the outer scalar-cache code that mainline migrated into the plugin layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wombatu-kun
pushed a commit
to wombatu-kun/lance
that referenced
this pull request
Jun 5, 2026
Mainline added serializable scalar-index caching (lance-format#6793, lance-format#6874) and moved the TRACE_IO_EVENTS / record_index_load instrumentation from the outer call site into `scalar::open_scalar_index`. The relocated trace references a `uuid_str` local that no longer exists after the branch dropped the `&str` form, and the inner `index` binding is shadowed by the loaded plugin index. Capture `index.uuid` (a `Uuid`) before the shadowing and format it via Display. Also re-add the `UnsizedCacheKey` import in `rust/lance/src/index.rs`; the new `ScalarIndexCacheKey` introduced by this branch implements it, but the import was lost when the auto-merge pruned the outer scalar-cache code that mainline migrated into the plugin layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wombatu-kun
pushed a commit
to wombatu-kun/lance
that referenced
this pull request
Jun 7, 2026
Mainline added serializable scalar-index caching (lance-format#6793, lance-format#6874) and moved the TRACE_IO_EVENTS / record_index_load instrumentation from the outer call site into `scalar::open_scalar_index`. The relocated trace references a `uuid_str` local that no longer exists after the branch dropped the `&str` form, and the inner `index` binding is shadowed by the loaded plugin index. Capture `index.uuid` (a `Uuid`) before the shadowing and format it via Display. Also re-add the `UnsizedCacheKey` import in `rust/lance/src/index.rs`; the new `ScalarIndexCacheKey` introduced by this branch implements it, but the import was lost when the auto-merge pruned the outer scalar-cache code that mainline migrated into the plugin layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wombatu-kun
pushed a commit
to wombatu-kun/lance
that referenced
this pull request
Jun 9, 2026
Mainline added serializable scalar-index caching (lance-format#6793, lance-format#6874) and moved the TRACE_IO_EVENTS / record_index_load instrumentation from the outer call site into `scalar::open_scalar_index`. The relocated trace references a `uuid_str` local that no longer exists after the branch dropped the `&str` form, and the inner `index` binding is shadowed by the loaded plugin index. Capture `index.uuid` (a `Uuid`) before the shadowing and format it via Display. Also re-add the `UnsizedCacheKey` import in `rust/lance/src/index.rs`; the new `ScalarIndexCacheKey` introduced by this branch implements it, but the import was lost when the auto-merge pruned the outer scalar-cache code that mainline migrated into the plugin layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
CacheCodecimpls so Bitmap and LabelList index cache entries survive through a persistent cache backend, mirroring the BTree work in #6793.CacheCodecImpl for RowAddrTreeMap(delegates to existingserialize_into/deserialize_from), so per-value bitmap entries cached underBitmapKeyare codec-backed.BitmapIndexStatecaptures the value→offset map (Arrow IPC), the null bitmap, and the value type.BitmapIndexPluginoverridesget_from_cache/put_in_cacheto store this sized state.LabelListIndexStatewraps an innerBitmapIndexStatepluslist_nullsand gets the same plugin-level codec treatment.open_scalar_indexskips the LabelList compatibility check on cache hits, so a fully-cached LabelList query no longer pays an extrabitmap_page_lookup.lanceopen per call.Tests
BitmapIndexState(empty + populated).test_{bitmap,label_list}_prewarm_with_serializing_backend_serves_query_with_no_ioasserting zero IOPS after prewarm through a serializing cache backend.Closes #6744