Skip to content

Regression: can_prune stops pruning and/or/eq predicates #8366

Description

@tomsanbear

What happened?

We'd been on Vortex 0.69 for a while and, upgrading to 0.74, hit a regression caused by the removal of ScalarFnConstantRule in #7575.

can_prune evaluates a predicate's stat-falsification expression over the one-row file-stats array, but it only accepted a constant-folded result (Columnar::Constant) and treated a materialized one-row boolean (Columnar::Canonical) as "cannot prove".

After #7575 removed bottom-up constant folding, composite falsifications no longer fold to a constant: boolean trees (and/or) and eq (whose falsification is internally or(min > lit, lit > max)) now execute to a one-row Canonical, so can_prune discards the result and stops pruning. Bare gt/lt comparisons still fold through the compare kernel's one-level constant fast path, so only composite and eq predicates regressed.

Steps to reproduce

  1. Write a Vortex file containing a struct column whose statistics bound the values — e.g. an age column with values [15, 18, 22, 25] (min 15, max 25) and a price column [120, 130, 140, 150].
  2. Open the file (SESSION.open_options().open_buffer(buf)?).
  3. Call file.can_prune(&eq(col("age"), lit(5)))? ... 5 is outside the [15, 25] min/max, so the file provably contains no match. Expected Ok(true) (prunable); actual Ok(false).
  4. Call file.can_prune(&and(gt(col("age"), lit(30)), lt(col("price"), lit(100))))? ...both branches are falsified by the stats. Expected Ok(true); actual Ok(false).
  5. Call file.can_prune(&or(gt(col("age"), lit(30)), lt(col("age"), lit(10))))? ... both branches falsified. Expected Ok(true); actual Ok(false).
  6. As a control, call file.can_prune(&gt(col("age"), lit(30)))? ... a bare comparison, which still folds through the compare kernel's fast path. Returns Ok(true) correctly, showing only composite/eq predicates regressed.

Environment

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions