Is your feature request related to a problem or challenge?
equal_rows_arr compares pairs of 2 arrays with indices for equality but shows up in profiles.
Currently this is done in the following way
take the values for the indices for the first pair
- comparing the arrays using
eq or not_distinct
- doing the same for the next pairs and
anding the results
- Filtering the indices based on the resulting boolean array
Describe the solution you'd like
We could optimize this in some ways:
- Writing a kernel that doesn't use
take (i.e. copy the array) but compares arrays based on the indices.
- Target creating a single
RecordBatch using BatchCoalescer rather than creating multiple small batches and concatenating afterwards
- Writing results to a single booleanbuffer rather than creating a new one every time
- Removing indices from the list (e.g. using
Vec::retain) not matching rather than creating a boolean array for a filter
- Reuse allocations as much as possible between batches
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
equal_rows_arr compares pairs of 2 arrays with indices for equality but shows up in profiles.
Currently this is done in the following way
takethe values for the indices for the first paireqornot_distinctanding the resultsDescribe the solution you'd like
We could optimize this in some ways:
take(i.e. copy the array) but compares arrays based on the indices.RecordBatchusingBatchCoalescerrather than creating multiple small batches and concatenating afterwardsVec::retain) not matching rather than creating a boolean array for a filterDescribe alternatives you've considered
No response
Additional context
No response