Use Knuth–Morris–Pratt algorithm for delimiter search in TextIO#32398
Use Knuth–Morris–Pratt algorithm for delimiter search in TextIO#32398scwhittle merged 6 commits intoapache:masterfrom
Conversation
|
R: @scwhittle, @Abacn Hi @scwhittle, I see you're working on the PR #32258 for fixing the issue #32241. I created this PR to fix the issue #32251 , but I believe it can also fix #32249. Hi @Abacn, I see you approved the PR #32298, but it cannot fix all the cases of #32241. Could you check it again? |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
scwhittle
left a comment
There was a problem hiding this comment.
Thanks for putting this together, I got bogged down in trying to make startReading correct as well but it is probably best to fix separately.
startReading is incorrect in cases where the offset is smaller than the delimiter length, or smaller than where the headers end, where there is a BOM and headers etc which I uncovered when adding integration test of longer delimiters to sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java
Outdated
Show resolved
Hide resolved
03eb22d to
5b827b7
Compare
5b827b7 to
8e5a6cd
Compare
|
@scwhittle Could you continue to review? I added a commit to change the delimiter search algorithm. Thanks. |
scwhittle
left a comment
There was a problem hiding this comment.
Just some minor cleanup comments. Thanks!
sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java
Outdated
Show resolved
Hide resolved
|
Run Java_IOs_Direct PreCommit |
|
Waiting on tests to complete to merge |
79ed964 to
4ad8e27
Compare
|
@scwhittle I fixed a bug at a commit which made the test failed. Could you continue to review this? Thanks. |
I believe this PR can fix #32251 (it's closed but still not fully fixed. see this case) and #32249.
Fix #32249, #32251.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.