fix[file]: read the one-row pruning result in can_prune instead of re…#8369
Conversation
…quiring a constant vortex-data#7575 removed bottom-up constant-folding, so composite falsifications — and/or trees, and the or(min > lit, lit > max) that eq expands to — now execute to a one-row Canonical instead of folding to a Columnar::Constant. can_prune only read the folded case and mapped Canonical to false, so file-level pruning silently stopped for every and/or and eq predicate; bare gt/lt still fold, so they were unaffected. Nothing caught it because can_prune has no in-repo callers — the scan path prunes through FileStatsLayoutReader::evaluate_file_stats. Mirror that reader's read-out (execute to Canonical, take row 0) and add a regression test for bare, and/or, and eq predicates. Signed-off-by: Thomas Santerre <thomas@santerre.xyz>
| }) | ||
| let result = applied | ||
| .execute::<Canonical>(&mut ctx)? | ||
| .into_bool() |
There was a problem hiding this comment.
you can skip this into_bool
|
Great find thanks! Can we keep the |
Merging this PR will degrade performance by 12.89%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
20.4 µs | 35.5 µs | -42.59% |
| ❌ | Simulation | take_10k_random |
196.7 µs | 254.5 µs | -22.73% |
| ❌ | Simulation | patched_take_10k_contiguous_patches |
230.3 µs | 289.2 µs | -20.36% |
| ❌ | Simulation | patched_take_10k_random |
242.3 µs | 301.2 µs | -19.55% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.0)] |
111.3 µs | 138.3 µs | -19.51% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.1)] |
111.3 µs | 138.1 µs | -19.4% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.01)] |
111 µs | 137.7 µs | -19.4% |
| ❌ | Simulation | take_10k_contiguous |
252.3 µs | 309.9 µs | -18.58% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(1000, 10)] |
161.6 µs | 197.9 µs | -18.33% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
176.9 µs | 213.1 µs | -16.99% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.1)] |
80.7 µs | 89.8 µs | -10.14% |
| ⚡ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
351.3 µs | 298.9 µs | +17.53% |
| ⚡ | Simulation | decompress_rd[f64, (100000, 0.0)] |
980.4 µs | 845.5 µs | +15.96% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 215.3 ns | +13.55% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 275.6 ns | +10.58% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing tomsanbear:fix/can-prune-one-row-result (4662318) with develop (0dd63f0)
for sure, let me make a quick patch to address this and robert's comment |
…nonical arm Address review from vortex-data#8369: retain the Columnar::Constant fast path and read the row-0 scalar directly in the Canonical arm, dropping the redundant into_bool round-trip. Signed-off-by: Thomas Santerre <thomas@santerre.xyz>
|
@joseph-isaacs @robert3005 updated with the change to maintain the previous match arm |
Summary
Closes: #8366
VortexFile::can_prunestopped pruning for everyand/orandeqpredicate after #7575 removed bottom-up constant-folding. It evaluates a predicate's stat-falsification expression over the one-row file-stats array, but only read a foldedColumnar::Constantresult and mappedColumnar::Canonical(_)tofalse, discarding the computed boolean. Post-#7575, composite falsifications (booleanand/ortrees, and theor(min > lit, lit > max)thateqexpands to) execute to a one-rowCanonicalinstead of folding, so pruning silently stopped for them. Baregt/ltcomparisons still fold through the compare kernel's one-level constant fast path, so only composite andeqpredicates regressed.It went unnoticed upstream because
can_prunehas no in-repo callers, the scan path prunes throughFileStatsLayoutReader::evaluate_file_stats, which already reads the one-row result out correctly. This change makescan_prunemirror that read-out (execute toCanonical, take the row-0 scalar) and adds an end-to-end regression test covering bare,and/or, andeqpredicates with non-falsifiable controls.API Changes
No signature changes. This restores the runtime behavior of the public
VortexFile::can_prune: it now returnstrue(prunable) forand/or/eqpredicates that file statistics prove cannot match, where it previously returnedfalse. No engine integration is affected. The scan path already pruned correctly viaevaluate_file_stats; only direct callers ofcan_prunesaw the regression. No docs changes needed.Testing
test_can_prune_composite_predicates(vortex-file/src/tests.rs): writes a file with stat-boundedage/pricecolumns and asserts pruning for bare,and,or, andeqfalsifiable predicates, plus non-falsifiable controls that must not prune.Columnar::Canonical(_) => falsearm and passes after.cargo nextest run -p vortex-file— 62/62 pass (existing suite + new test).cargo +nightly fmt -p vortex-fileandcargo clippy -p vortex-file --all-targetsclean.AI assistance
Developed with assistance from Claude Code (agentic AI) for root-cause analysis, the fix, and the regression test. All changes were reviewed and verified by me against the Vortex source and a downstream test suite.