Skip to content

Add append_non_nulls to StructBuilder #9429

@Fokko

Description

@Fokko

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I'm doing some performance optimization, and noticed that we have a loop adding one value to the null mask at a time. Instead, I'd suggest adding append_non_nulls to do this at once.

append_non_nulls(n) vs append(true) in a loop (with bitmap allocated)

┌───────────┬───────────────────┬─────────────────────┬─────────┐ 
│     n     │ append(true) loop │ append_non_nulls(n) │ speedup │ 
├───────────┼───────────────────┼─────────────────────┼─────────┤
│ 100       │ 251 ns            │ 73 ns               │ ~3x     │
├───────────┼───────────────────┼─────────────────────┼─────────┤
│ 1,000     │ 2.0 µs            │ 94 ns               │ ~21x    │
├───────────┼───────────────────┼─────────────────────┼─────────┤
│ 10,000    │ 19.3 µs           │ 119 ns              │ ~162x   │
├───────────┼───────────────────┼─────────────────────┼─────────┤
│ 100,000   │ 191 µs            │ 348 ns              │ ~549x   │
├───────────┼───────────────────┼─────────────────────┼─────────┤
│ 1,000,000 │ 1.90 ms           │ 3.5 µs              │ ~543x   │
└───────────┴───────────────────┴─────────────────────┴─────────┘

Describe the solution you'd like

There is already append_nulls which is the counterparts for adding a range of false to the null-mask. So, I'd suggest to add this for the non-null values as well.

Describe alternatives you've considered

Calling append in a loop, which keeps the CPU toasty warm.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions