ParquetPushDecoder API to clear all buffered ranges by nathanb9 · Pull Request #9624 · apache/arrow-rs

nathanb9 · 2026-03-29T21:58:15Z

Which issue does this PR close?

Closes Add a way to clear out all buffered ranges from ParquetPushDecoder #8676

Rationale for this change

ParquetPushDecoder clears exact requested ranges, but larger speculative pushed ranges can remain buffered in PushBuffers. This adds a way for callers to explicitly release non exact ranges

What changes are included in this PR?

This adds release_all_ranges(), which clears all byte ranges still staged in the decoder's internal PushBuffers

Are these changes tested?

Kinda tested. Tests added to verify the buffer is empty and verified clearing does not interrupt the rowgroup reader

Are there any user-facing changes?

Yes,this adds a new public release_all_ranges() API on ParquetPushDecoder

AndreaBozzo

i like this, waiting for someone else to have a look aswell

alamb

This is very nice -- thank you @nathanb9 and @AndreaBozzo

My only comment is about naming. Let me know what you think

nathanb9 · 2026-04-06T22:26:24Z

Thanks @alamb @AndreaBozzo. Probably should also add an analogous one for ParquetMetaDataPushDecoder? since it could also be used to speculatively push. Ill make a PR for that too if you guys can review that

Also, if users find clever ways of getting benefits by speculatively pushing might eventually want to have a smarter version of this clear api or more granular type of clear. Maybe can experiment with this in datafusion

alamb · 2026-04-07T13:41:02Z

Probably should also add an analogous one for ParquetMetaDataPushDecoder? since it could also be used to speculatively push. Ill make a PR for that too if you guys can review that

Thanks @nathanb9 --yes I agree that sounds like a good idea to me

…api-to-clear-all-buffered-ranges

alamb · 2026-04-07T15:34:06Z

I pushed a new commit to this PR to fix CI and merged up from main

alamb · 2026-04-07T21:11:57Z

Thanks again @nathanb9 and @AndreaBozzo

The `PushDecoder` (introduced in apache#7997, apache#8080) is designed to decouple IO and CPU. It holds non-contiguous byte ranges, with a `NeedsData`/`push_range` protocol. However, it requires each logical read to be satisfied in full by a single physical buffer: `has_range`, `get_bytes`, and `Read::read` all searched for one buffer that entirely covered the requested range. This assumption conflates two orthogonal IO strategies: - Coalescing: the IO layer merges adjacent requested ranges into fewer, larger fetches. - Prefetching: the IO layer pushes data ahead of what the decoder has requested. This is an inversion of control: the IO layer speculatively fills buffers at offsets not yet requested and for arbitrary buffer sizes. These two strategies interact poorly with the current release mechanism (`clear_ranges`), which matches buffers by exact range equality: - Coalescing is both rewarded and punished. It is load bearing because without it, the number of physical buffers scale with ranges requested, and `clear_ranges` performs an O(N×M) scan to remove consumed ranges, producing quadratic overhead on wide schemas. But it is also punished because a coalesced buffer never exactly matches any individual requested range, so `clear_ranges` silently skips it: the buffer leaks in `PushBuffers` until the decoder finishes or the caller manually calls `release_all_ranges` (apache#9624). This increases peak RSS proportionally to the amount of data coalesced ahead of the current row group. - Prefetching is structurally impossible: speculatively pushed buffers will straddle future read boundaries, so the decoder cannot consume them, and `clear_ranges` cannot release them. This commit makes `PushBuffers` boundary-agnostic, completing the prefetching story, and changes the internals to scale with buffer count instead of range count: - Buffer stitching: `has_range`, `get_bytes`, and `Read::read` resolve logical ranges across multiple contiguous physical buffers via binary search, so the IO layer is free to push arbitrarily-sized parts without knowing future read boundaries. This is a nice improvement, because some IO layer can be made much more efficient when using uniform buffers and vectorized reads. - Incremental release (`release_through`): replaces `clear_ranges` with a watermark-based release that drops all buffers below a byte offset, trimming straddling buffers via zero-copy `Bytes::slice`. The decoder calls this automatically at row-group boundaries. Benchmark results (vs baseline): push_decoder/1buf/1000ranges 321.9 µs (was 323.5 µs, −1%) push_decoder/1buf/10000ranges 3.26 ms (was 3.25 ms, +0%) push_decoder/1buf/100000ranges 34.9 ms (was 34.6 ms, +1%) push_decoder/1buf/500000ranges 192.2 ms (was 185.3 ms, +4%) push_decoder/Nbuf/1000ranges 363.9 µs (was 437.2 µs, −17%) push_decoder/Nbuf/10000ranges 3.82 ms (was 10.7 ms, −64%) push_decoder/Nbuf/100000ranges 42.1 ms (was 711.6 ms, −94%) Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

This PR is a follow up for [this ticket ](#8676). Implement same API but for the metadata decoder. See also #9624 (comment) ## Rationale for this change `ParquetMetaDataPushDecoder` clears exact requested ranges, but larger speculative pushed ranges can remain buffered in `PushBuffers`. This adds a way for callers to explicitly release non exact ranges ## What changes are included in this PR? This adds `clear_all_ranges()`, which clears all byte ranges still staged in the decoder's internal `PushBuffers` ## Are these changes tested? yes ## Are there any user-facing changes? Yes, this adds a new public `clear_all_ranges()` API on `ParquetMetaDataPushDecoder`

The `PushDecoder` (introduced in apache#7997, apache#8080) is designed to decouple IO and CPU. It holds non-contiguous byte ranges, with a `NeedsData`/`push_range` protocol. However, it requires each logical read to be satisfied in full by a single physical buffer: `has_range`, `get_bytes`, and `Read::read` all searched for one buffer that entirely covered the requested range. This assumption conflates two orthogonal IO strategies: - Coalescing: the IO layer merges adjacent requested ranges into fewer, larger fetches. - Prefetching: the IO layer pushes data ahead of what the decoder has requested. This is an inversion of control: the IO layer speculatively fills buffers at offsets not yet requested and for arbitrary buffer sizes. These two strategies interact poorly with the current release mechanism (`clear_ranges`), which matches buffers by exact range equality: - Coalescing is both rewarded and punished. It is load bearing because without it, the number of physical buffers scale with ranges requested, and `clear_ranges` performs an O(N×M) scan to remove consumed ranges, producing quadratic overhead on wide schemas. But it is also punished because a coalesced buffer never exactly matches any individual requested range, so `clear_ranges` silently skips it: the buffer leaks in `PushBuffers` until the decoder finishes or the caller manually calls `release_all_ranges` (apache#9624). This increases peak RSS proportionally to the amount of data coalesced ahead of the current row group. - Prefetching is structurally impossible: speculatively pushed buffers will straddle future read boundaries, so the decoder cannot consume them, and `clear_ranges` cannot release them. This commit makes `PushBuffers` boundary-agnostic, completing the prefetching story, and changes the internals to scale with buffer count instead of range count: - Buffer stitching: `has_range`, `get_bytes`, and `Read::read` resolve logical ranges across multiple contiguous physical buffers via binary search, so the IO layer is free to push arbitrarily-sized parts without knowing future read boundaries. This is a nice improvement, because some IO layer can be made much more efficient when using uniform buffers and vectorized reads. - Incremental release (`release_through`): replaces `clear_ranges` with a watermark-based release that drops all buffers below a byte offset, trimming straddling buffers via zero-copy `Bytes::slice`. The decoder calls this automatically at row-group boundaries. Benchmark results (vs baseline): push_decoder/1buf/1000ranges 321.9 µs (was 323.5 µs, −1%) push_decoder/1buf/10000ranges 3.26 ms (was 3.25 ms, +0%) push_decoder/1buf/100000ranges 34.9 ms (was 34.6 ms, +1%) push_decoder/1buf/500000ranges 192.2 ms (was 185.3 ms, +4%) push_decoder/Nbuf/1000ranges 363.9 µs (was 437.2 µs, −17%) push_decoder/Nbuf/10000ranges 3.82 ms (was 10.7 ms, −64%) push_decoder/Nbuf/100000ranges 42.1 ms (was 711.6 ms, −94%) Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

## Which issue does this PR close? - Closes apache#8676 ## Rationale for this change `ParquetPushDecoder` clears exact requested ranges, but larger speculative pushed ranges can remain buffered in `PushBuffers`. This adds a way for callers to explicitly release non exact ranges ## What changes are included in this PR? This adds `release_all_ranges()`, which clears all byte ranges still staged in the decoder's internal `PushBuffers` ## Are these changes tested? Kinda tested. Tests added to verify the buffer is empty and verified clearing does not interrupt the rowgroup reader ## Are there any user-facing changes? Yes,this adds a new public `release_all_ranges()` API on `ParquetPushDecoder` --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

) This PR is a follow up for [this ticket ](apache#8676). Implement same API but for the metadata decoder. See also apache#9624 (comment) ## Rationale for this change `ParquetMetaDataPushDecoder` clears exact requested ranges, but larger speculative pushed ranges can remain buffered in `PushBuffers`. This adds a way for callers to explicitly release non exact ranges ## What changes are included in this PR? This adds `clear_all_ranges()`, which clears all byte ranges still staged in the decoder's internal `PushBuffers` ## Are these changes tested? yes ## Are there any user-facing changes? Yes, this adds a new public `clear_all_ranges()` API on `ParquetMetaDataPushDecoder`

Add API to clear all buffered ranges

104c208

github-actions Bot added the parquet Changes to the parquet crate label Mar 29, 2026

nathanb9 added 2 commits March 29, 2026 18:14

Clarify release_all_ranges docs

20acef2

Tighten release_all_ranges test comment

c617299

nathanb9 marked this pull request as ready for review March 29, 2026 22:53

AndreaBozzo approved these changes Apr 3, 2026

View reviewed changes

alamb approved these changes Apr 6, 2026

View reviewed changes

Comment thread parquet/src/arrow/push_decoder/reader_builder/mod.rs Outdated

rename release_all_ranges to clear_all_ranges

13da3d5

alamb added 2 commits April 7, 2026 11:33

fix doc build

f8bd0f3

Merge remote-tracking branch 'apache/main' into parquet-push-decoder-…

41a55a9

…api-to-clear-all-buffered-ranges

alamb merged commit aac969d into apache:main Apr 7, 2026
16 checks passed

etseidl mentioned this pull request Apr 12, 2026

ParquetMetaDataPushDecoder API to clear all buffered ranges #9673

Merged

HippoBaro mentioned this pull request Apr 13, 2026

feat(parquet): make PushBuffers boundary-agnostic for prefetch IO #9697

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ParquetPushDecoder API to clear all buffered ranges#9624

ParquetPushDecoder API to clear all buffered ranges#9624
alamb merged 6 commits into
apache:mainfrom
nathanb9:parquet-push-decoder-api-to-clear-all-buffered-ranges

nathanb9 commented Mar 29, 2026 •

edited by alamb

Loading

Uh oh!

AndreaBozzo left a comment

Uh oh!

alamb left a comment

Uh oh!

Uh oh!

nathanb9 commented Apr 6, 2026

Uh oh!

alamb commented Apr 7, 2026

Uh oh!

alamb commented Apr 7, 2026

Uh oh!

Uh oh!

alamb commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

nathanb9 commented Mar 29, 2026 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

AndreaBozzo left a comment

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nathanb9 commented Apr 6, 2026

Uh oh!

alamb commented Apr 7, 2026

Uh oh!

alamb commented Apr 7, 2026

Uh oh!

Uh oh!

alamb commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nathanb9 commented Mar 29, 2026 •

edited by alamb

Loading