Skip to content

Optimize ArrowBytesViewMap #19961

@Dandandan

Description

@Dandandan

Is your feature request related to a problem or challenge?

There seem some opportunities for optimizing ArrowBytesViewMap using some more cleverness.

For e.g. ClickBench query 5, >50% CPU is spent during intern:

Image

A lot of it relates to getting / comparing the bytes from the buffers, etc (append_value, get_value, memcmp, makeview, etc).

Image

Describe the solution you'd like

We should be able to avoid (re)creating views every time and comparing against slices, by storing/comparing the views directly, and avoiding the overhead of the GenericByteViewBuilder methods.

To do so, I think we need:

  • Not use values.iter() but use the view buffer and get buffer index
  • Compare against the original view (and buffer in the index if needed)
  • Update the new view with the new index (don't create it again).

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceMake DataFusion faster

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions