Skip to content

ARROW-11414: [Rust] Reduce copies in Schema::try_merge#9347

Closed
alamb wants to merge 1 commit into
apache:masterfrom
alamb:alamb/less-copy-in-merge
Closed

ARROW-11414: [Rust] Reduce copies in Schema::try_merge#9347
alamb wants to merge 1 commit into
apache:masterfrom
alamb:alamb/less-copy-in-merge

Conversation

@alamb

@alamb alamb commented Jan 28, 2021

Copy link
Copy Markdown
Contributor

I was looking at this code yesterday while using it in IOx -- https://github.com/influxdata/influxdb_iox/pull/703

Rationale:

Even though Schema::try_merge requires a slice of Schemas (not schema refs) ownership of its inputs, it copies all of its fields. This is inefficient ideal in the common case where most of the fields in the merged Schema will be the same

Changes:

This PR proposes to change the implementation so that try_merge takes something (like a Vec) that can iterate over the Schemas passed in and consume them, avoiding at least one copy per unique field. I intend no algorithmic changes, only performance improvement.

@github-actions

Copy link
Copy Markdown

/// use arrow::datatypes::*;
///
/// let merged = Schema::try_merge(&vec![
/// let merged = Schema::try_merge(vec![

@alamb alamb Jan 28, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in this example shows a pretty good example of how I think the usability (as well as performance) of this API is improved by this PR

@jorgecarleitao jorgecarleitao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houqp / @nevi-me -- any concerns about this PR?

(I think it was introduced here 494e7a9)

@houqp houqp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. In general, i think this is the pattern we would want to apply to the rest of the code base whenever applicable. i.e. if a function takes reference but does internal clones, then it's better to let caller manage the ownership instead.

@alamb

alamb commented Feb 2, 2021

Copy link
Copy Markdown
Contributor Author

i.e. if a function takes reference but does internal clones, then it's better to let caller manage the ownership instead.

Yes I like this approach a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants