perf(scan): intra-file decode parallelism — sub-split large chunk spans#8400
Conversation
SplitBy::Layout now sub-divides any span between adjacent chunk boundaries wider than IDEAL_SPLIT_SIZE (100k rows) into evenly sized row-range splits, so files with few large chunks decode across multiple cores. Subdivision only inserts points strictly between existing adjacent boundaries: the half-open ranges consumers derive remain a contiguous, non-overlapping, exact partition of the same rows. The arithmetic saturates at u64::MAX. Tests: unit/property/overflow coverage for the subdivision helper, an end-to-end test that a 250k-row single flat chunk scans correctly across sub-divided splits with bounded split sizes, an rstest-parameterized end-to-end test for fixed-size SplitBy::RowCount scans (previously only covered at the boundary-math level), and an end-to-end test that ~120-byte string columns written with the default strategy keep their natural ~8k-row chunk splits untouched by the cap. Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Introduce a test-local MAX_SPLIT_ROWS mirroring the private IDEAL_SPLIT_SIZE instead of repeating 100_000, and bound the string-chunk test relative to the cap (< MAX_SPLIT_ROWS / 4) rather than pinning the current ~8k repartition default. Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 273.6 ns | -10.66% |
| ⚡ | Simulation | encode_varbin[(1000, 4)] |
157 µs | 139.8 µs | +12.34% |
| ⚡ | Simulation | encode_varbin[(1000, 32)] |
162.5 µs | 144.8 µs | +12.22% |
| ⚡ | Simulation | encode_varbin[(1000, 8)] |
157.3 µs | 140.4 µs | +12.05% |
| ⚡ | Simulation | encode_varbin[(1000, 2)] |
156.1 µs | 140.7 µs | +10.98% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing spiceai:lukim/scan-split-large-chunks-develop (ca2aa8b) with develop (e2478aa)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
robert3005
left a comment
There was a problem hiding this comment.
Sorry for the delay reviewing it. I think this is good to go. You will need to rebase/merge develop
…-large-chunks-develop
…ub.com> I, Luke Kim <80174+lukekim@users.noreply.github.com>, hereby add my Signed-off-by to this commit: f6534fc Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Summary
SplitBy::Layoutnow sub-divides any span between adjacent chunk boundaries wider thanIDEAL_SPLIT_SIZE(100k rows) into evenly sized row-range splits, so a file with few large chunks (e.g. a single flat layout, or byte-targeted int columns that coalesce to ~262k rows/chunk) decodes across multiple cores instead of one.Correctness
Subdivision only inserts points strictly between existing adjacent boundaries — it never moves or removes one — so the half-open ranges consumers derive (
tuple_windows) remain a contiguous, non-overlapping, exact partition of the same rows. Spans at or below the cap pass through untouched (fast-path no-op). All boundary consumers (RepeatedScan,VortexFile::splits(), the DataFusion repartitioner) operate on arbitrary ranges; sub-chunk ranges were already exercised bySplitBy::RowCount. The arithmetic saturates atu64::MAX.API/Observable behavior
For files whose merged (projected-column) chunk boundaries leave spans > 100k rows,
VortexFile::splits()(incl. Python bindings) returns more, smaller ranges; scans emit smaller batches; DataFusion gets real repartitioning where a single-chunk file previously collapsed to one partition. Fine-grained files (e.g. ~8k-row string chunks from the default 1 MiB block target) are untouched.Testing
subdivide_large_spans(no-op, large single chunk, mixed gaps, exact-coverage property,u64::MAXboundary).SplitBy::RowCountscans (unaligned 33,333 and exceeds-file 300,000 cases).cargo nextest run -p vortex-layout -p vortex-file— 174 passed on this branch.