Skip to content

fix[file]: read the one-row pruning result in can_prune instead of re…#8369

Merged
joseph-isaacs merged 3 commits into
vortex-data:developfrom
tomsanbear:fix/can-prune-one-row-result
Jun 11, 2026
Merged

fix[file]: read the one-row pruning result in can_prune instead of re…#8369
joseph-isaacs merged 3 commits into
vortex-data:developfrom
tomsanbear:fix/can-prune-one-row-result

Conversation

@tomsanbear

Copy link
Copy Markdown
Contributor

Summary

Closes: #8366

VortexFile::can_prune stopped pruning for every and/or and eq predicate after #7575 removed bottom-up constant-folding. It evaluates a predicate's stat-falsification expression over the one-row file-stats array, but only read a folded Columnar::Constant result and mapped Columnar::Canonical(_) to false, discarding the computed boolean. Post-#7575, composite falsifications (boolean and/or trees, and the or(min > lit, lit > max) that eq expands to) execute to a one-row Canonical instead of folding, so pruning silently stopped for them. Bare gt/lt comparisons still fold through the compare kernel's one-level constant fast path, so only composite and eq predicates regressed.

It went unnoticed upstream because can_prune has no in-repo callers, the scan path prunes through FileStatsLayoutReader::evaluate_file_stats, which already reads the one-row result out correctly. This change makes can_prune mirror that read-out (execute to Canonical, take the row-0 scalar) and adds an end-to-end regression test covering bare, and/or, and eq predicates with non-falsifiable controls.

API Changes

No signature changes. This restores the runtime behavior of the public VortexFile::can_prune: it now returns true (prunable) for and/or/eq predicates that file statistics prove cannot match, where it previously returned false. No engine integration is affected. The scan path already pruned correctly via evaluate_file_stats; only direct callers of can_prune saw the regression. No docs changes needed.

Testing

  • New test_can_prune_composite_predicates (vortex-file/src/tests.rs): writes a file with stat-bounded age/price columns and asserts pruning for bare, and, or, and eq falsifiable predicates, plus non-falsifiable controls that must not prune.
  • Verified non-vacuous: the test fails against the pre-fix Columnar::Canonical(_) => false arm and passes after.
  • cargo nextest run -p vortex-file — 62/62 pass (existing suite + new test).
  • cargo +nightly fmt -p vortex-file and cargo clippy -p vortex-file --all-targets clean.

AI assistance

Developed with assistance from Claude Code (agentic AI) for root-cause analysis, the fix, and the regression test. All changes were reviewed and verified by me against the Vortex source and a downstream test suite.

…quiring a constant

vortex-data#7575 removed bottom-up constant-folding, so composite falsifications — and/or trees, and the or(min > lit, lit > max) that eq expands to — now execute to a one-row Canonical instead of folding to a Columnar::Constant. can_prune only read the folded case and mapped Canonical to false, so file-level pruning silently stopped for every and/or and eq predicate; bare gt/lt still fold, so they were unaffected. Nothing caught it because can_prune has no in-repo callers — the scan path prunes through FileStatsLayoutReader::evaluate_file_stats.

Mirror that reader's read-out (execute to Canonical, take row 0) and add a regression test for bare, and/or, and eq predicates.

Signed-off-by: Thomas Santerre <thomas@santerre.xyz>
@tomsanbear tomsanbear requested a review from a team June 11, 2026 15:39
Comment thread vortex-file/src/file.rs Outdated
})
let result = applied
.execute::<Canonical>(&mut ctx)?
.into_bool()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can skip this into_bool

Comment thread vortex-file/src/file.rs
@joseph-isaacs

Copy link
Copy Markdown
Contributor

Great find thanks!

Can we keep the execute::<Columnar> and just check if the canonical first value is a boolean

@codspeed-hq

codspeed-hq Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merging this PR will degrade performance by 12.89%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
❌ 11 regressed benchmarks
✅ 1517 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(1000, 10)] 20.4 µs 35.5 µs -42.59%
Simulation take_10k_random 196.7 µs 254.5 µs -22.73%
Simulation patched_take_10k_contiguous_patches 230.3 µs 289.2 µs -20.36%
Simulation patched_take_10k_random 242.3 µs 301.2 µs -19.55%
Simulation decompress_rd[f64, (10000, 0.0)] 111.3 µs 138.3 µs -19.51%
Simulation decompress_rd[f64, (10000, 0.1)] 111.3 µs 138.1 µs -19.4%
Simulation decompress_rd[f64, (10000, 0.01)] 111 µs 137.7 µs -19.4%
Simulation take_10k_contiguous 252.3 µs 309.9 µs -18.58%
Simulation chunked_varbinview_canonical_into[(1000, 10)] 161.6 µs 197.9 µs -18.33%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 176.9 µs 213.1 µs -16.99%
Simulation decompress_rd[f32, (10000, 0.1)] 80.7 µs 89.8 µs -10.14%
WallTime cuda/bitpacked_u8/unpack/3bw[100M] 351.3 µs 298.9 µs +17.53%
Simulation decompress_rd[f64, (100000, 0.0)] 980.4 µs 845.5 µs +15.96%
Simulation bitwise_not_vortex_buffer_mut[128] 244.4 ns 215.3 ns +13.55%
Simulation bitwise_not_vortex_buffer_mut[1024] 304.7 ns 275.6 ns +10.58%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing tomsanbear:fix/can-prune-one-row-result (4662318) with develop (0dd63f0)

Open in CodSpeed

@tomsanbear

Copy link
Copy Markdown
Contributor Author

Great find thanks!

Can we keep the execute::<Columnar> and just check if the canonical first value is a boolean

for sure, let me make a quick patch to address this and robert's comment

…nonical arm

Address review from vortex-data#8369: retain the Columnar::Constant fast path and read the row-0 scalar directly in the Canonical arm, dropping the redundant into_bool round-trip.

Signed-off-by: Thomas Santerre <thomas@santerre.xyz>
@tomsanbear

Copy link
Copy Markdown
Contributor Author

@joseph-isaacs @robert3005 updated with the change to maintain the previous match arm

@joseph-isaacs joseph-isaacs added the changelog/fix A bug fix label Jun 11, 2026
@joseph-isaacs joseph-isaacs merged commit d021152 into vortex-data:develop Jun 11, 2026
73 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: can_prune stops pruning and/or/eq predicates

3 participants