Skip to content

make sure that only concat preallocates buffers#382

Merged
alamb merged 7 commits into
apache:masterfrom
ritchie46:concat_mem
Jun 8, 2021
Merged

make sure that only concat preallocates buffers#382
alamb merged 7 commits into
apache:masterfrom
ritchie46:concat_mem

Conversation

@ritchie46

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Partial fix for #347. In #348 @jorgecarleitao pointed out that the memory savings don't work for the filter and the zip kernel.
This PR restores the implementation for those two kernels, and keeps the new preallocation for the concat kernel.

This is achieved by create a builder pattern for MutableArrayData. This way we don't break the API, and we may choose to preallocate buffers.

Could you take a look at this @jorgecarleitao ?

Are there any user-facing changes?

There is an builder pattern struct for MutableArrayData.

If there are any breaking changes to public APIs, please add the breaking change label.

No

@codecov-commenter

codecov-commenter commented May 30, 2021

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 59.21053% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.58%. Comparing base (4ff2e56) to head (db0a5b4).

Files with missing lines Patch % Lines
arrow/src/array/transform/mod.rs 49.18% 31 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #382      +/-   ##
==========================================
- Coverage   82.61%   82.58%   -0.03%     
==========================================
  Files         162      162              
  Lines       44228    44329     +101     
==========================================
+ Hits        36538    36609      +71     
- Misses       7690     7720      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jorgecarleitao

Copy link
Copy Markdown
Member

Thanks @ritchie46 . My small concerns with this PR:

  1. We are introducing yet another builder API for something relatively easy to accomplish with an extra method
  2. preallocate_buffers assumes that "preallocate" means allocate the maximum possible capacity, which IMO is not really pre-allocating, is just being very conservative and allocating the maximum.
  3. The API is rather limited to the concatenate case.

I left some ideas as a comment on the issue

@ritchie46

Copy link
Copy Markdown
Contributor Author

@jorgecarleitao I implemented your proposal from #347. As we now need to define an enum in the with_capacities constructor, I want to be extra certain that I added all possible variants, as adding one later is backwards incompatable. Could you check if I missed something.

For now, only the (large)-utf8 ones uses this, but if the design is ok, I can add more in later PRs.

Comment thread arrow/src/array/transform/mod.rs Outdated
@alamb

alamb commented Jun 8, 2021

Copy link
Copy Markdown
Contributor

@jorgecarleitao what is the status of this one?

@jorgecarleitao jorgecarleitao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go: this is a solid improvement to me.

@alamb alamb merged commit 0cbf85a into apache:master Jun 8, 2021
alamb pushed a commit that referenced this pull request Jun 8, 2021
* MutableArrayData::with_capacities

* better pattern matching

* add binary capacities

* add list child data

* add struct capacities

* add panic for dictionary type

* change dictionary capacity enum variant
@alamb

alamb commented Jun 8, 2021

Copy link
Copy Markdown
Contributor

Included in #411 cherry pick

alamb added a commit that referenced this pull request Jun 9, 2021
…se (#411)

* Reduce memory usage of concat (large)utf8 (#348)

* reduce memory needed for concat

* reuse code for str allocation buffer

* make sure that only concat preallocates buffers (#382)

* MutableArrayData::with_capacities

* better pattern matching

* add binary capacities

* add list child data

* add struct capacities

* add panic for dictionary type

* change dictionary capacity enum variant

Co-authored-by: Ritchie Vink <ritchie46@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants