Skip to content

perf: use roaring's range iter to speedup mask_to_offset_ranges#6871

Merged
westonpace merged 2 commits into
lance-format:mainfrom
westonpace:perf-mask-next-range
May 21, 2026
Merged

perf: use roaring's range iter to speedup mask_to_offset_ranges#6871
westonpace merged 2 commits into
lance-format:mainfrom
westonpace:perf-mask-next-range

Conversation

@westonpace

Copy link
Copy Markdown
Member

The function mask_to_offset_ranges is used at scan planning time to determine which rows to read from the file. This was a bottleneck when the mask was the result of a zonal index search because the old implementation materialized all of the offsets only to convert them back into ranges.

Luckily, roaring recently implemented a range-based iterator. Using this we can skip the materialization step. On my zonemap benchmark this doubles the speed of the search and, perhaps more importantly, removes a penalty I observed when the index is used even on queries that are not highly selective.

Generated with the assistance of Claude code.

westonpace and others added 2 commits May 16, 2026 18:51
Add a criterion benchmark suite targeting RowAddrMask / RowAddrTreeMap
that quantifies the cost of operations whose work is fundamentally
range-shaped but currently goes through per-row Partial(RoaringBitmap)
representation. Six groups:

  insert_range_single_run        - producer cost: insert one range
  into_addr_iter_single_run      - consumer cost: walk every row addr
  next_range_iter_single_run     - achievable cost via Iter::next_range
  intersect_two_runs             - set op on two range-shaped masks
  mask_to_offset_ranges_inner_loop - end-to-end slow path observed in
                                     IS NULL trace (495 ms / 889 ms)
  insert_runs_constant_cardinality - many small runs vs one big run

Each varies dataset size while holding number-of-ranges fixed at 1, so
linear scaling in N reveals where row count dominates the cost.

Headline finding (10M-row inputs):
  into_addr_iter:      19.4 ms   per-bit walk
  next_range iter:     1.72 us   per-run walk (~11000x faster)

The next_range/iter delta represents the speedup an alternate
range-aware iterator could surface to callers. The roaring crate
already represents the data as run-encoded containers; the
RowAddrMask public API does not expose them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `RowAddrTreeMap::iter_runs()` — a range-shaped consumer that walks
roaring's run-encoded containers via `Iter::next_range` instead of
yielding individual bits. Rewrites the `U64Segment::Range` arm of
`mask_to_offset_ranges` to use it, eliminating the per-bit walk that
dominated the IS NULL hot path documented in 1b9d7c0.

Benchmark deltas at 10M rows (single contiguous run, vs the bench
commit's `into_addr_iter` baseline):

  Consumer iteration            into_addr_iter   iter_runs    speedup
       N = 10K                       19.4 µs      17.6 ns     1,100x
       N = 100K                       191 µs      28.4 ns     6,800x
       N = 1M                        1.92 ms       181 ns    10,400x
       N = 10M                       19.5 ms      1.68 µs    11,600x

  mask_to_offset_ranges_inner_loop (end-to-end hot path):
       N = 10K                       19.7 µs       132 ns       150x
       N = 100K                       194 µs       262 ns       775x
       N = 1M                        1.93 ms      1.92 µs     1,000x
       N = 10M                       19.3 ms      20.1 µs       960x

Within ~3x of a dedicated Vec<RangeInclusive>-backed representation at
10M rows, but both are in the microseconds while the original was in
the milliseconds — irrelevant in the context of a query that takes
hundreds of ms.

The new method is ~70 lines (method + 2 tests + bench wiring) vs the
~700-line Runs-variant alternative, and adds no new public enum
variant or representation switch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@codecov

codecov Bot commented May 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.91837% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-core/src/utils/mask.rs 96.96% 1 Missing ⚠️
rust/lance-table/src/rowids.rs 93.75% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization!

@Xuanwo Xuanwo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change!

@westonpace westonpace merged commit 3d85fc7 into lance-format:main May 21, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants