fix: ensure sequence metadata list accesses are threadsafe in task runner#19467
Open
jtuglu1 wants to merge 1 commit into
Open
fix: ensure sequence metadata list accesses are threadsafe in task runner#19467jtuglu1 wants to merge 1 commit into
jtuglu1 wants to merge 1 commit into
Conversation
59c3912 to
6f824ce
Compare
6f824ce to
c2553ce
Compare
c2553ce to
4831e0e
Compare
Member
FrankChen021
left a comment
There was a problem hiding this comment.
I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.
Reviewed 3 of 3 changed files.
This is an automated review by Codex GPT-5.5
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #19458 and cleans up SeekableStreamIndexTaskRunnerTest.java
Old code had racy logic which was performing size checks on COW
sequencesin multiple places then attempting to index into the sequences list. This size check could be invalidated by a publishing thread calling .remove() on the between the size() check and the index operation.Race:
sequences.size()returns 2 heresequences.get()throws OOB error due to stale read race.Fix
While technically this issue can be solved by taking snapshots before doing multiple reads on the sequence list, that still doesn't prevent read/write inter-leavings that might cause temporary lapses in in-memory/on-disk state (especially if we crash). So, I guarded the sequence list with a reentrant lock. While we re-acquire this lock per record, running this under load/taking some flamegraphs did not add any noticeable overhead. I believe this is mainly because the lock is not frequently contended in the common case and bottlenecks in the ingestion code lie elsewhere. Switching to a rw lock is another option here but performance was the same between the two options. Will add the performance benchmarks soon.
Release note
Fix fatal race in streaming ingest task during segment publish.
This PR has: