Push gather down to Parquet Encoder#2109
Closed
tustvold wants to merge 3 commits into
Closed
Conversation
tustvold
commented
Jul 19, 2022
| let array = arrow::compute::cast(column, &ArrowDataType::Date32)?; | ||
| arrow::compute::cast(&array, &ArrowDataType::Int32)? | ||
| } else { | ||
| arrow::compute::cast(column, &ArrowDataType::Int32)? |
Contributor
Author
There was a problem hiding this comment.
This if statement was somewhat redundant, I suspect it dates from a refactor at some point
tustvold
commented
Jul 19, 2022
| fn write(&mut self, values: &Self::Values, offset: usize, len: usize) -> Result<()>; | ||
|
|
||
| /// Write the corresponding values to this [`ColumnValueEncoder`] | ||
| fn write_gather(&mut self, values: &Self::Values, indices: &[usize]) -> Result<()>; |
Contributor
Author
There was a problem hiding this comment.
I'm not totally sold on this name, suggestions welcome
Contributor
Author
|
I intend to run the benchmarks shortly and report back |
Codecov Report
@@ Coverage Diff @@
## master #2109 +/- ##
==========================================
- Coverage 83.73% 83.70% -0.03%
==========================================
Files 225 225
Lines 59412 59442 +30
==========================================
+ Hits 49748 49758 +10
- Misses 9664 9684 +20
|
Contributor
Author
|
Interestingly the performance gain is extremely minor, this might suggest the bottleneck is elsewhere 🤔 Edit: See #2123 |
Contributor
Author
Contributor
Author
|
Marking as a draft whilst I think a bit more on this |
Contributor
Author
|
Going to roll this into the optimized byte array PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #1764
Rationale for this change
Data is unnecessarily copied prior to writing it out
What changes are included in this PR?
Previously the
takeoperation necessary to handle nulls, lists, etc... was implemented prior to writing the data. This PR pushes it down into the encoder, avoiding an unnecessary copyAre there any user-facing changes?
No