feat: add OR pre-selection short-circuit#22979
Conversation
|
@neilconway do you have time to review this PR? |
|
@kumarUjjawal I can take a look; have you run benchmarks / evaluated what the performance is like? |
|
There was a problem hiding this comment.
@kumarUjjawal
Thanks for the fix. This looks good to me overall. I left a couple of small cleanup suggestions that might make the code a bit easier to read and maintain.
| // If the RHS is uniform on the selected rows, the whole | ||
| // expression collapses and no scatter is needed. | ||
| if boolean_array.null_count() == 0 { | ||
| let rhs_value = if !boolean_array.has_false() { |
There was a problem hiding this comment.
Small readability thought: the uniform RHS detection could be a bit flatter by encoding the two valid uniform cases directly.
let rhs_value = match (boolean_array.has_true(), boolean_array.has_false()) {
(true, false) => Some(true),
(false, true) => Some(false),
_ => None,
};I think this makes the mixed case easier to spot at a glance and avoids the nested if / else if.
| let mut result_array_builder = BooleanArray::builder(result_len); | ||
|
|
||
| // keep track of current position we have in right boolean array | ||
| let mut right_array_pos = 0; |
There was a problem hiding this comment.
pre_selection_scatter now has two very similar SlicesIterator loops, with the main difference being how selected rows are filled. It might be worth folding this into one loop and choosing the selected-row behavior inside it.
That would keep the invariant in one place: gaps get fill_value, while selected rows get RHS values or nulls. It may also help avoid future drift between the AND and OR paths.
Which issue does this PR close?
PreSelectionshort-circuit strategy forOR#22342.Rationale for this change
BinaryExpralready uses pre-selection forANDwhen only a small set of LHS rows can affect the final result. This adds the matching optimization forORwhen most LHS rows are already true.What changes are included in this PR?
This PR extends pre-selection short-circuiting to
OR.For
OR, the RHS is evaluated only for rows where the LHS is false. Rows where the LHS is true are filled directly as true. The existingANDpath is kept and the scatter logic is shared.Are these changes tested?
Yes
Are there any user-facing changes?
No Public API Change