fix[file]: read the one-row pruning result in can_prune instead of re… by tomsanbear · Pull Request #8369 · vortex-data/vortex

tomsanbear · 2026-06-11T15:39:01Z

Summary

Closes: #8366

VortexFile::can_prune stopped pruning for every and/or and eq predicate after #7575 removed bottom-up constant-folding. It evaluates a predicate's stat-falsification expression over the one-row file-stats array, but only read a folded Columnar::Constant result and mapped Columnar::Canonical(_) to false, discarding the computed boolean. Post-#7575, composite falsifications (boolean and/or trees, and the or(min > lit, lit > max) that eq expands to) execute to a one-row Canonical instead of folding, so pruning silently stopped for them. Bare gt/lt comparisons still fold through the compare kernel's one-level constant fast path, so only composite and eq predicates regressed.

It went unnoticed upstream because can_prune has no in-repo callers, the scan path prunes through FileStatsLayoutReader::evaluate_file_stats, which already reads the one-row result out correctly. This change makes can_prune mirror that read-out (execute to Canonical, take the row-0 scalar) and adds an end-to-end regression test covering bare, and/or, and eq predicates with non-falsifiable controls.

API Changes

No signature changes. This restores the runtime behavior of the public VortexFile::can_prune: it now returns true (prunable) for and/or/eq predicates that file statistics prove cannot match, where it previously returned false. No engine integration is affected. The scan path already pruned correctly via evaluate_file_stats; only direct callers of can_prune saw the regression. No docs changes needed.

Testing

New test_can_prune_composite_predicates (vortex-file/src/tests.rs): writes a file with stat-bounded age/price columns and asserts pruning for bare, and, or, and eq falsifiable predicates, plus non-falsifiable controls that must not prune.
Verified non-vacuous: the test fails against the pre-fix Columnar::Canonical(_) => false arm and passes after.
cargo nextest run -p vortex-file — 62/62 pass (existing suite + new test).
cargo +nightly fmt -p vortex-file and cargo clippy -p vortex-file --all-targets clean.

AI assistance

Developed with assistance from Claude Code (agentic AI) for root-cause analysis, the fix, and the regression test. All changes were reviewed and verified by me against the Vortex source and a downstream test suite.

…quiring a constant vortex-data#7575 removed bottom-up constant-folding, so composite falsifications — and/or trees, and the or(min > lit, lit > max) that eq expands to — now execute to a one-row Canonical instead of folding to a Columnar::Constant. can_prune only read the folded case and mapped Canonical to false, so file-level pruning silently stopped for every and/or and eq predicate; bare gt/lt still fold, so they were unaffected. Nothing caught it because can_prune has no in-repo callers — the scan path prunes through FileStatsLayoutReader::evaluate_file_stats. Mirror that reader's read-out (execute to Canonical, take row 0) and add a regression test for bare, and/or, and eq predicates. Signed-off-by: Thomas Santerre <thomas@santerre.xyz>

robert3005 · 2026-06-11T15:50:24Z

-        })
+        let result = applied
+            .execute::<Canonical>(&mut ctx)?
+            .into_bool()


you can skip this into_bool

joseph-isaacs · 2026-06-11T15:56:09Z

Great find thanks!

Can we keep the execute::<Columnar> and just check if the canonical first value is a boolean

codspeed-hq · 2026-06-11T15:56:31Z

Merging this PR will degrade performance by 12.89%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
❌ 11 regressed benchmarks
✅ 1517 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_bool_canonical_into[(1000, 10)]`	20.4 µs	35.5 µs	-42.59%
❌	Simulation	`take_10k_random`	196.7 µs	254.5 µs	-22.73%
❌	Simulation	`patched_take_10k_contiguous_patches`	230.3 µs	289.2 µs	-20.36%
❌	Simulation	`patched_take_10k_random`	242.3 µs	301.2 µs	-19.55%
❌	Simulation	`decompress_rd[f64, (10000, 0.0)]`	111.3 µs	138.3 µs	-19.51%
❌	Simulation	`decompress_rd[f64, (10000, 0.1)]`	111.3 µs	138.1 µs	-19.4%
❌	Simulation	`decompress_rd[f64, (10000, 0.01)]`	111 µs	137.7 µs	-19.4%
❌	Simulation	`take_10k_contiguous`	252.3 µs	309.9 µs	-18.58%
❌	Simulation	`chunked_varbinview_canonical_into[(1000, 10)]`	161.6 µs	197.9 µs	-18.33%
❌	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	176.9 µs	213.1 µs	-16.99%
❌	Simulation	`decompress_rd[f32, (10000, 0.1)]`	80.7 µs	89.8 µs	-10.14%
⚡	WallTime	`cuda/bitpacked_u8/unpack/3bw[100M]`	351.3 µs	298.9 µs	+17.53%
⚡	Simulation	`decompress_rd[f64, (100000, 0.0)]`	980.4 µs	845.5 µs	+15.96%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[128]`	244.4 ns	215.3 ns	+13.55%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[1024]`	304.7 ns	275.6 ns	+10.58%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing tomsanbear:fix/can-prune-one-row-result (4662318) with develop (0dd63f0)}

tomsanbear · 2026-06-11T15:56:44Z

Great find thanks!

Can we keep the execute::<Columnar> and just check if the canonical first value is a boolean

for sure, let me make a quick patch to address this and robert's comment

…nonical arm Address review from vortex-data#8369: retain the Columnar::Constant fast path and read the row-0 scalar directly in the Canonical arm, dropping the redundant into_bool round-trip. Signed-off-by: Thomas Santerre <thomas@santerre.xyz>

tomsanbear · 2026-06-11T16:10:58Z

@joseph-isaacs @robert3005 updated with the change to maintain the previous match arm

tomsanbear requested a review from a team June 11, 2026 15:39

Merge branch 'develop' into fix/can-prune-one-row-result

c28c458

robert3005 reviewed Jun 11, 2026

View reviewed changes

joseph-isaacs reviewed Jun 11, 2026

View reviewed changes

Comment thread vortex-file/src/file.rs

robert3005 mentioned this pull request Jun 11, 2026

File based pruning handles single row canonical arrays instead of just constant arrays #8370

Closed

joseph-isaacs approved these changes Jun 11, 2026

View reviewed changes

joseph-isaacs added the changelog/fix A bug fix label Jun 11, 2026

joseph-isaacs merged commit d021152 into vortex-data:develop Jun 11, 2026
73 of 78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix[file]: read the one-row pruning result in can_prune instead of re…#8369

fix[file]: read the one-row pruning result in can_prune instead of re…#8369
joseph-isaacs merged 3 commits into
vortex-data:developfrom
tomsanbear:fix/can-prune-one-row-result

tomsanbear commented Jun 11, 2026

Uh oh!

robert3005 Jun 11, 2026

Uh oh!

Uh oh!

joseph-isaacs commented Jun 11, 2026

Uh oh!

codspeed-hq Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

tomsanbear commented Jun 11, 2026

Uh oh!

tomsanbear commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

tomsanbear commented Jun 11, 2026

Summary

API Changes

Testing

AI assistance

Uh oh!

robert3005 Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joseph-isaacs commented Jun 11, 2026

Uh oh!

codspeed-hq Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 12.89%

Performance Changes

Uh oh!

tomsanbear commented Jun 11, 2026

Uh oh!

tomsanbear commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq Bot commented Jun 11, 2026 •

edited

Loading