Skip to content

feat: add OR pre-selection short-circuit#22979

Open
kumarUjjawal wants to merge 1 commit into
apache:mainfrom
kumarUjjawal:feature/or-preselection-short-circuit
Open

feat: add OR pre-selection short-circuit#22979
kumarUjjawal wants to merge 1 commit into
apache:mainfrom
kumarUjjawal:feature/or-preselection-short-circuit

Conversation

@kumarUjjawal

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

BinaryExpr already uses pre-selection for AND when only a small set of LHS rows can affect the final result. This adds the matching optimization for OR when most LHS rows are already true.

What changes are included in this PR?

This PR extends pre-selection short-circuiting to OR.

For OR, the RHS is evaluated only for rows where the LHS is false. Rows where the LHS is true are filled directly as true. The existing AND path is kept and the scatter logic is shared.

Are these changes tested?

Yes

Are there any user-facing changes?

No Public API Change

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jun 16, 2026
@kumarUjjawal

Copy link
Copy Markdown
Contributor Author

@neilconway do you have time to review this PR?

@neilconway

Copy link
Copy Markdown
Contributor

@kumarUjjawal I can take a look; have you run benchmarks / evaluated what the performance is like?

@kumarUjjawal

Copy link
Copy Markdown
Contributor Author
   case                         pr branch          main    main / pr branch
  ━━━━━━━━━━━━━━━━━━━━━━━━  ━━━━━━━━━━━━  ━━━━━━━━━━━━  ━━━━━━━━━━━━━━━
   all_true                   31.812 ns     32.251 ns            1.01x
  ────────────────────────  ────────────  ────────────  ───────────────
   one_false_first           178.400 us    474.120 us            2.66x
  ────────────────────────  ────────────  ────────────  ───────────────
   one_false_last            198.490 us    474.650 us            2.39x
  ────────────────────────  ────────────  ────────────  ───────────────
   one_false_middle          197.900 us    474.040 us            2.40x
  ────────────────────────  ────────────  ────────────  ───────────────
   one_false_middle_left     194.520 us    473.940 us            2.44x
  ────────────────────────  ────────────  ────────────  ───────────────
   one_false_middle_right    198.680 us    474.080 us            2.39x
  ────────────────────────  ────────────  ────────────  ───────────────
   all_false_in_or           474.700 us    473.610 us            1.00x

@kosiew kosiew left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kumarUjjawal
Thanks for the fix. This looks good to me overall. I left a couple of small cleanup suggestions that might make the code a bit easier to read and maintain.

// If the RHS is uniform on the selected rows, the whole
// expression collapses and no scatter is needed.
if boolean_array.null_count() == 0 {
let rhs_value = if !boolean_array.has_false() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small readability thought: the uniform RHS detection could be a bit flatter by encoding the two valid uniform cases directly.

let rhs_value = match (boolean_array.has_true(), boolean_array.has_false()) {
    (true, false) => Some(true),
    (false, true) => Some(false),
    _ => None,
};

I think this makes the mixed case easier to spot at a glance and avoids the nested if / else if.

let mut result_array_builder = BooleanArray::builder(result_len);

// keep track of current position we have in right boolean array
let mut right_array_pos = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre_selection_scatter now has two very similar SlicesIterator loops, with the main difference being how selected rows are filled. It might be worth folding this into one loop and choosing the selected-row behavior inside it.

That would keep the invariant in one place: gaps get fill_value, while selected rows get RHS values or nulls. It may also help avoid future drift between the AND and OR paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider PreSelection short-circuit strategy for OR

3 participants