Skip to content

chore: Prefer append_value_n over append_value#1868

Merged
Fokko merged 3 commits intodelta-io:mainfrom
Fokko:fd-optimize-insert-value
Feb 18, 2026
Merged

chore: Prefer append_value_n over append_value#1868
Fokko merged 3 commits intodelta-io:mainfrom
Fokko:fd-optimize-insert-value

Conversation

@Fokko
Copy link
Collaborator

@Fokko Fokko commented Feb 17, 2026

What changes are proposed in this pull request?

Instead of adding the value in the loop, we push the operation down to Arrow which is much faster.

How was this change tested?

@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 60.00000% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.89%. Comparing base (af71c28) to head (9ca0eb2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/engine/arrow_expression/mod.rs 60.00% 10 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1868   +/-   ##
=======================================
  Coverage   85.89%   85.89%           
=======================================
  Files         137      137           
  Lines       40012    40012           
  Branches    40012    40012           
=======================================
  Hits        34368    34368           
  Misses       4155     4155           
  Partials     1489     1489           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for (builder, value) in field_builders.zip(data.values()) {
value.append_to(builder, 1)?;
}
builder.append(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a corresponding bulk operation for this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no... I only see one for bulk-appending nulls:
https://docs.rs/arrow/latest/arrow/array/struct.StructBuilder.html#method.append_nulls

Under the hood, both append and append_null are delegating to the underlying NullBufferBuilder, and even the bulk-null is extending from an internally created vec![false; n], which doesn't seem especially fantastic either -- NullBufferBuilder::append_n_nulls would almost certainly be more efficient, and there's a matching NullBufferBuilder::append_n_non_nulls?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe lets add a small TODO and consider a contribution upstream?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// StructBuilder and ListBuilder both provide it.
let builder =
builder_as!(array::MapBuilder<Box<dyn ArrayBuilder>, Box<dyn ArrayBuilder>>);
for _ in 0..num_rows {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the null case we should be able to optimize this as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any bulk-null appender nor any way to access the underlying null buffer builder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on TODO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can also try this with REE arrays, if it still isn't fast enough

Copy link
Collaborator Author

@Fokko Fokko Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raise a PR here:

apache/arrow-rs#9432

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emkornfield You mean RLE?

Copy link
Collaborator

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small questions, otherwise generally seems reasonable. It might be good to have a small microbenchmark to demonstrate the speed difference.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, given the limitations of arrow-rs API.

for (builder, value) in field_builders.zip(data.values()) {
value.append_to(builder, 1)?;
}
builder.append(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no... I only see one for bulk-appending nulls:
https://docs.rs/arrow/latest/arrow/array/struct.StructBuilder.html#method.append_nulls

Under the hood, both append and append_null are delegating to the underlying NullBufferBuilder, and even the bulk-null is extending from an internally created vec![false; n], which doesn't seem especially fantastic either -- NullBufferBuilder::append_n_nulls would almost certainly be more efficient, and there's a matching NullBufferBuilder::append_n_non_nulls?

// StructBuilder and ListBuilder both provide it.
let builder =
builder_as!(array::MapBuilder<Box<dyn ArrayBuilder>, Box<dyn ArrayBuilder>>);
for _ in 0..num_rows {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any bulk-null appender nor any way to access the underlying null buffer builder?

}};
}

// Use append_value in a loop for builders without batch append (String, Binary)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created an upstream PR to add this to the GenericByteArray: apache/arrow-rs#9426

It will both harmonize the APIs between the builders and throw in a 10-20% speed improvement 🥳

@Fokko Fokko force-pushed the fd-optimize-insert-value branch from 8f0c9f2 to c7d6416 Compare February 18, 2026 10:15
@github-actions github-actions bot added the breaking-change Change that require a major version bump label Feb 18, 2026
@Fokko
Copy link
Collaborator Author

Fokko commented Feb 18, 2026

Added TODOs and moving this forward since there is concensus. Let's pick up the remaining things in separate PRs. Thanks @scovich and @emkornfield for the prompt review 🚀

@Fokko Fokko merged commit 6363e88 into delta-io:main Feb 18, 2026
21 of 22 checks passed
@Fokko Fokko deleted the fd-optimize-insert-value branch February 18, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change that require a major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants