utils: use WeakReference in BatchInserter to fix ThreadLocal retention of JCQueue#8812
Merged
Conversation
c9ca62f to
a321441
Compare
a321441 to
bfe31dd
Compare
Contributor
Author
|
@GGraziadeid LLM powered fix according to your suggestion. |
Member
|
Hi @reiabreu , |
GGraziadei
approved these changes
Jun 28, 2026
bfe31dd to
fe58489
Compare
…n of JCQueue BatchInserter held a strong reference to its owning JCQueue, and the inserters live in instance-field ThreadLocals on the same JCQueue. This formed a cycle through ThreadLocalMap: value (BatchInserter) -> queue (JCQueue) -> thdLocalBatcher (ThreadLocal) = key Because the key was strongly reachable via the value, the weak-key expunge path in ThreadLocalMap never triggered, and the JCQueue (along with its metrics, recv/overflow queues and batch buffers) could not be GC'd for as long as any producer thread that ever published to it stayed alive. The fix stores the JCQueue as a WeakReference inside BatchInserter, cutting the value->key path. When the last external strong ref to the JCQueue is dropped, the ThreadLocal field it owns becomes weakly reachable, the ThreadLocalMap key can be expunged, and both the BatchInserter and the JCQueue are released. flush() and tryFlush() dereference the WeakReference once at entry and bail out cleanly if the queue has already been collected (dead topology in LocalCluster/embedded scenarios). publish() and tryPublish() are unchanged — they only manipulate currentBatch. Fixes #8810 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fe58489 to
6413f22
Compare
Contributor
Author
|
thanks for that great suggestion for the test @GGraziadei . I've just added it to the changeset |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #8810
Problem
BatchInserterheld a strongJCQueue queuereference, and inserter instances live in instance-fieldThreadLocals on the sameJCQueue. This creates a cycle that prevents the normal ThreadLocal weak-key expunge path from ever firing:The weak key in
ThreadLocalMapbecomes eligible for expunge only when it is not strongly reachable. Here it always is — viavalue → queue → field— so the entry is never cleaned up, and theJCQueue(recv queue, overflow queue, metrics, batch buffer) is retained for as long as the producer thread lives.In production this is harmless: Storm runs workers as dedicated JVM processes where threads and queues share a lifetime. In
LocalCluster, embedded deployments, or test harnesses that repeatedly start/stop topologies on long-lived threads, each stopped topology can strand itsJCQueueinstances.Fix
Store the
JCQueueas aWeakReferenceinsideBatchInserter. This cuts thevalue → JCQueuestrong path:When the last external strong ref to the
JCQueueis dropped, thethdLocalBatcherfield (and therefore theThreadLocalMapkey) becomes weakly-reachable, and the entry is expunged on the nextThreadLocalaccess by the producer thread.BatchInserteris then released too.flush()andtryFlush()dereference theWeakReferenceonce at entry and bail out silently if the queue was already collected (the topology is dead; in-flight tuples are already lost).publish()andtryPublish()need no changes — they only touchcurrentBatch.Note for PR #8796
DynamicBatchInserter(introduced in #8796) extendsBatchInserterand its own overrides (batchSize(),afterFlush()) never accessqueuedirectly, so this fix covers it automatically — no separate change is needed there.Test plan
mvn test -pl storm-client -Dtest=JCQueueTestpassesLocalClusterthat repeatedly submits/kills a topology and verify theJCQueueinstances are no longer retained after kill🤖 Generated with Claude Code