perf(scan): intra-file decode parallelism — sub-split large chunk spans by lukekim · Pull Request #8400 · vortex-data/vortex

lukekim · 2026-06-12T21:20:52Z

Summary

SplitBy::Layout now sub-divides any span between adjacent chunk boundaries wider than IDEAL_SPLIT_SIZE (100k rows) into evenly sized row-range splits, so a file with few large chunks (e.g. a single flat layout, or byte-targeted int columns that coalesce to ~262k rows/chunk) decodes across multiple cores instead of one.

Correctness

Subdivision only inserts points strictly between existing adjacent boundaries — it never moves or removes one — so the half-open ranges consumers derive (tuple_windows) remain a contiguous, non-overlapping, exact partition of the same rows. Spans at or below the cap pass through untouched (fast-path no-op). All boundary consumers (RepeatedScan, VortexFile::splits(), the DataFusion repartitioner) operate on arbitrary ranges; sub-chunk ranges were already exercised by SplitBy::RowCount. The arithmetic saturates at u64::MAX.

API/Observable behavior

For files whose merged (projected-column) chunk boundaries leave spans > 100k rows, VortexFile::splits() (incl. Python bindings) returns more, smaller ranges; scans emit smaller batches; DataFusion gets real repartitioning where a single-chunk file previously collapsed to one partition. Fine-grained files (e.g. ~8k-row string chunks from the default 1 MiB block target) are untouched.

Testing

Unit/property/overflow tests for subdivide_large_spans (no-op, large single chunk, mixed gaps, exact-coverage property, u64::MAX boundary).
E2E: 250k-row single flat chunk → splits all ≤ the cap, contiguous, exact endpoints; full + filtered scans match the unsplit data.
E2E (rstest): fixed-size SplitBy::RowCount scans (unaligned 33,333 and exceeds-file 300,000 cases).
E2E: ~120-byte string column via the default write strategy keeps its natural fine-grained chunk splits (bounded relative to the cap, not to writer defaults).
cargo nextest run -p vortex-layout -p vortex-file — 174 passed on this branch.

SplitBy::Layout now sub-divides any span between adjacent chunk boundaries wider than IDEAL_SPLIT_SIZE (100k rows) into evenly sized row-range splits, so files with few large chunks decode across multiple cores. Subdivision only inserts points strictly between existing adjacent boundaries: the half-open ranges consumers derive remain a contiguous, non-overlapping, exact partition of the same rows. The arithmetic saturates at u64::MAX. Tests: unit/property/overflow coverage for the subdivision helper, an end-to-end test that a 250k-row single flat chunk scans correctly across sub-divided splits with bounded split sizes, an rstest-parameterized end-to-end test for fixed-size SplitBy::RowCount scans (previously only covered at the boundary-math level), and an end-to-end test that ~120-byte string columns written with the default strategy keep their natural ~8k-row chunk splits untouched by the cap. Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>

Introduce a test-local MAX_SPLIT_ROWS mirroring the private IDEAL_SPLIT_SIZE instead of repeating 100_000, and bound the string-chunk test relative to the cap (< MAX_SPLIT_ROWS / 4) rather than pinning the current ~8k repartition default. Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>

codspeed-hq · 2026-06-12T21:30:21Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 4 improved benchmarks
❌ 1 regressed benchmark
✅ 1584 untouched benchmarks
⏩ 4 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`bitwise_not_vortex_buffer_mut[128]`	244.4 ns	273.6 ns	-10.66%
⚡	Simulation	`encode_varbin[(1000, 4)]`	157 µs	139.8 µs	+12.34%
⚡	Simulation	`encode_varbin[(1000, 32)]`	162.5 µs	144.8 µs	+12.22%
⚡	Simulation	`encode_varbin[(1000, 8)]`	157.3 µs	140.4 µs	+12.05%
⚡	Simulation	`encode_varbin[(1000, 2)]`	156.1 µs	140.7 µs	+10.98%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing spiceai:lukim/scan-split-large-chunks-develop (ca2aa8b) with develop (e2478aa)}

4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

robert3005

Sorry for the delay reviewing it. I think this is good to go. You will need to rebase/merge develop

…-large-chunks-develop

…ests

…ub.com> I, Luke Kim <80174+lukekim@users.noreply.github.com>, hereby add my Signed-off-by to this commit: f6534fc Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>

lukekim added 3 commits June 12, 2026 14:01

Merge branch 'develop' into lukim/scan-split-large-chunks-develop

e8047a4

lukekim requested a review from a team June 12, 2026 21:20

gatesn added the action/benchmark Trigger full benchmarks to run on this PR label Jun 12, 2026

lukekim mentioned this pull request Jun 12, 2026

Merge upstream Vortex 0.75.0 into spiceai-54 (DataFusion 53 → 54) spiceai/vortex#65

Merged

robert3005 added action/benchmark Trigger full benchmarks to run on this PR changelog/performance A performance improvement and removed action/benchmark Trigger full benchmarks to run on this PR labels Jun 23, 2026

robert3005 approved these changes Jun 23, 2026

View reviewed changes

lukekim added 4 commits June 24, 2026 08:35

Merge remote-tracking branch 'upstream/develop' into lukim/scan-split…

6d6ff96

…-large-chunks-develop

test: add execution context to assert_arrays_eq in large chunk scan t…

f6534fc

…ests

DCO Remediation Commit for Luke Kim <80174+lukekim@users.noreply.gith…

2ef67a8

…ub.com> I, Luke Kim <80174+lukekim@users.noreply.github.com>, hereby add my Signed-off-by to this commit: f6534fc Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>

Merge branch 'develop' into lukim/scan-split-large-chunks-develop

ca2aa8b

robert3005 merged commit 9d3aafb into vortex-data:develop Jun 24, 2026
69 of 70 checks passed

AdamGS mentioned this pull request Jun 24, 2026

perf: Subsplitting large chunks causes some regression for vortex-compact for some benchmarks #8587

Open

lukekim mentioned this pull request Jun 25, 2026

Fix flat reader subrange decode reuse #8596

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(scan): intra-file decode parallelism — sub-split large chunk spans#8400

perf(scan): intra-file decode parallelism — sub-split large chunk spans#8400
robert3005 merged 7 commits into
vortex-data:developfrom
spiceai:lukim/scan-split-large-chunks-develop

lukekim commented Jun 12, 2026

Uh oh!

codspeed-hq Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

robert3005 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lukekim commented Jun 12, 2026

Summary

Correctness

API/Observable behavior

Testing

Uh oh!

codspeed-hq Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Footnotes

Uh oh!

robert3005 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq Bot commented Jun 12, 2026 •

edited

Loading