Skip to content

Fuse operations in equal_rows_arr #12131

Description

@Dandandan

Is your feature request related to a problem or challenge?

equal_rows_arr compares pairs of 2 arrays with indices for equality but shows up in profiles.

Currently this is done in the following way

  • take the values for the indices for the first pair
  • comparing the arrays using eq or not_distinct
  • doing the same for the next pairs and anding the results
  • Filtering the indices based on the resulting boolean array

Describe the solution you'd like

We could optimize this in some ways:

  • Writing a kernel that doesn't use take (i.e. copy the array) but compares arrays based on the indices.
  • Target creating a single RecordBatch using BatchCoalescer rather than creating multiple small batches and concatenating afterwards
  • Writing results to a single booleanbuffer rather than creating a new one every time
  • Removing indices from the list (e.g. using Vec::retain) not matching rather than creating a boolean array for a filter
  • Reuse allocations as much as possible between batches

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceMake DataFusion faster

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions