Skip to content

WIP Tests demonstrating TextIO issues with large delimiters and small splits and skipped headers#32258

Closed
scwhittle wants to merge 1 commit intoapache:masterfrom
scwhittle:textio_split
Closed

WIP Tests demonstrating TextIO issues with large delimiters and small splits and skipped headers#32258
scwhittle wants to merge 1 commit intoapache:masterfrom
scwhittle:textio_split

Conversation

@scwhittle
Copy link
Copy Markdown
Contributor


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@Abacn
Copy link
Copy Markdown
Contributor

Abacn commented Oct 2, 2024

note - this is superceded by #32398

@scwhittle
Copy link
Copy Markdown
Contributor Author

There are portions of this that were not superceded that I got sidetracked on. Notably I believe there are issues with the positioning performed by startReading. If skipping headers it seems that some records may be missed, and if the position is small in comparison to the delimiter length I think there may be duplicates.

I pushed just the test changes to demonstrate. I will try to find some time to finish up the fixes and push them as well.

@scwhittle scwhittle changed the title WIP to improve custom delimiter to support overlapping and spanning buffers without exception. WIP Tests demonstrating TextIO issues with large delimiters and small splits and skipped headers Oct 7, 2024
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 6, 2024

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 6, 2024
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Dec 14, 2024
@scwhittle scwhittle deleted the textio_split branch July 1, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants