File
`src/providers/openai_compatible.py:618-644`
chunk_queue: _queue.Queue = _queue.Queue()
def _drain_stream() -> None:
try:
for c in stream:
chunk_queue.put(c)
...
Background
PR #148 introduced the worker-thread + queue pattern to make ESC unwind promptly under LiteLLM. Its own PR body flags this as a known follow-up:
Memory: `queue.Queue` is unbounded, so a worst-case "stuck iterator AND chunks flowing" path could accumulate megabytes. Flagged as a follow-up; not blocking on current evidence (the user's repro has the iterator stuck, not flowing).
Impact
A non-graceful disconnect from a proxy that keeps sending bytes after ESC (and never closes the SDK iterator) → the daemon worker keeps calling `chunk_queue.put(c)` forever. The main thread has already raised `AbortError` and dropped its reference, but the queue holds chunks alive. Memory grows until the upstream socket finally closes.
Realistic on long-running sessions through misbehaving proxies (LiteLLM under load, custom corp proxies that buffer).
Fix sketch
Two options:
- Bound the queue (e.g. `_queue.Queue(maxsize=64)`) and use `put_nowait()` with a drop-on-full policy once the abort signal has tripped (we don't care about the chunks at that point).
- Cooperative worker — make the worker check `guard.aborted` between chunks and exit early.
Option 2 is the cleaner of the two but requires the SDK to honor close-on-aborted reads, which is the entire reason PR #148 introduced the worker pattern in the first place. Probably want both: a maxsize for safety, plus cooperative exit when the SDK plays along.
Cover with a regression test that injects a never-ending iterator and asserts queue size stays bounded post-abort.
File
`src/providers/openai_compatible.py:618-644`
Background
PR #148 introduced the worker-thread + queue pattern to make ESC unwind promptly under LiteLLM. Its own PR body flags this as a known follow-up:
Impact
A non-graceful disconnect from a proxy that keeps sending bytes after ESC (and never closes the SDK iterator) → the daemon worker keeps calling `chunk_queue.put(c)` forever. The main thread has already raised `AbortError` and dropped its reference, but the queue holds chunks alive. Memory grows until the upstream socket finally closes.
Realistic on long-running sessions through misbehaving proxies (LiteLLM under load, custom corp proxies that buffer).
Fix sketch
Two options:
Option 2 is the cleaner of the two but requires the SDK to honor close-on-aborted reads, which is the entire reason PR #148 introduced the worker pattern in the first place. Probably want both: a maxsize for safety, plus cooperative exit when the SDK plays along.
Cover with a regression test that injects a never-ending iterator and asserts queue size stays bounded post-abort.