Improve the performance of TextSource by reducing how many byte[]s are copied (fixes #23193)#23196
Improve the performance of TextSource by reducing how many byte[]s are copied (fixes #23193)#23196lukecwik merged 3 commits intoapache:masterfrom
Conversation
|
R: @bhisevishal |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
…e copied (fixes apache#23193) This makes TextSource take about 2.3x less CPU resources during decoding. Before this change: ``` TextSourceBenchmark.benchmarkTextSource thrpt 5 0.248 ± 0.029 ops/s ``` After this change: ``` TextSourceBenchmark.benchmarkHadoopLineReader thrpt 5 0.465 ± 0.064 ops/s TextSourceBenchmark.benchmarkTextSource thrpt 5 0.575 ± 0.059 ops/s ```
|
Run Java PreCommit |
|
Run Java_Examples_Dataflow PreCommit |
TheNeuralBit
left a comment
There was a problem hiding this comment.
LGTM
Please link the issue by adding "Fixes #23193" in the description.
| * <p>We're reading data from inChannel, but the head of the stream may be already buffered in | ||
| * buffer, so we have several cases: 1. No newline characters are in the buffer, so we need to | ||
| * copy everything and read another buffer from the stream. 2. An unambiguously terminated line | ||
| * is in buffer, so we just create currentValue 3. Ambiguously terminated line is in buffer, |
There was a problem hiding this comment.
nit: could this be an HTML ordered list?
Its already part of the description, what am I missing? |
You have it in the title, but not in the description. There's some documentation here: https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue |
That is an annoying limitation. Thanks for pointing it out. |
|
Run Spotless PreCommit |
|
Run Java PreCommit |
…te[]s are copied (fixes apache#23193) (apache#23196)" This reverts commit 30a48f0
This makes TextSource take about 2.3x less CPU resources during decoding.
Before this change:
After this change:
Fixes #23193
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.