Skip to content

fix(rtc): avoid KeyError in local_track_unpublished handler during teardown#692

Open
tmkarthi wants to merge 3 commits into
livekit:mainfrom
8loop-ai:fix/key-error-on-local_track_unpublished
Open

fix(rtc): avoid KeyError in local_track_unpublished handler during teardown#692
tmkarthi wants to merge 3 commits into
livekit:mainfrom
8loop-ai:fix/key-error-on-local_track_unpublished

Conversation

@tmkarthi
Copy link
Copy Markdown

@tmkarthi tmkarthi commented Jun 1, 2026

Fixes #681#681

Problem

Room._on_room_event does an unchecked self.local_participant.track_publications[sid] lookup on the local_track_unpublished branch (room.py). LocalParticipant.unpublish_track removes the publication from _track_publications when its FFI async response is processed, and that response races the local_track_unpublished room event. When the response is handled first, the SID is already gone by the time the event is dispatched and the handler raises KeyError.

_listen_task catches it and logs error running user callback for local_track_unpublished: ... with a full traceback, so the worker does not crash — but every affected disconnect cleanup produces a spurious ERROR log. In a production agent fleet this fired ~18×/24h on early/CLIENT_INITIATED disconnects (~1 in several hundred). The local_track_unpublished event is also silently dropped when the race fires.

This is the only local_track_* handler that crashes on a missing SID — track_unpublished (remote) and local_track_republished (#655) already tolerate it.

Fix

room.py — handler: pop defensively and only emit when present, mirroring the remote track_unpublished and local_track_republished handlers (with a debug log for the dropped case):

sid = event.local_track_unpublished.publication_sid
lpublication = self.local_participant._track_publications.pop(sid, None)
if lpublication is not None:
    self.emit("local_track_unpublished", lpublication)
else:
    logging.debug("local_track_unpublished for untracked publication sid %s", sid)

This also folds removal into the handler, fixing a latent leak where a server-forced unpublish (no unpublish_track call) previously left the publication in the dict forever.

participant.pyunpublish_track: make its pop tolerant.

publication = self._track_publications.pop(track_sid, None)
if publication is not None:
    publication._track = None

This second change is required, not cosmetic. The KeyError is rare (~1 in several hundred), which means the common ordering is the room event being processed while the SID is still present — i.e. before unpublish_track's pop. Once the handler itself starts popping, in that common path the old unchecked pop(track_sid) in unpublish_track would raise a new KeyError on nearly every unpublish. Making it tolerant prevents simply relocating the crash.

Behavior across both race orderings

  • Event first (common): handler pops + emits; unpublish_track gets None, skips — no crash, event still delivered.
  • Async response first (the rare race that was crashing): unpublish_track pops + clears _track; handler gets None, skips emit — no crash. The event is dropped, but it was crashing (never emitting) before, so this is strictly better.

Testing

  • ast-parsed both files; no new lint/type diagnostics.
  • The repo has no unit tests around _on_room_event (existing tests are e2e and require a live server), so no test was added. Happy to add one with a mocked participant if preferred.

🤖 Generated with Claude Code

…ardown

Room._on_room_event did an unchecked
`self.local_participant.track_publications[sid]` lookup on the
`local_track_unpublished` branch. LocalParticipant.unpublish_track
removes the publication from `_track_publications` when its FFI async
response is processed, and that response races the `local_track_unpublished`
room event. When the response is handled first, the SID is already gone and
the handler raises KeyError, which _listen_task logs as a spurious ERROR
with a full traceback on every affected disconnect cleanup.

Pop the publication defensively in the handler and only emit when it is
still tracked, mirroring the remote `track_unpublished` and
`local_track_republished` handlers. This also folds removal into the
handler, fixing a latent leak where a server-forced unpublish (no
unpublish_track call) left the publication in the dict.

Because the handler now owns removal, the common (non-racing) ordering is
that the room event is processed before unpublish_track's pop. Make
unpublish_track's pop tolerant (`pop(track_sid, None)` + guarded
`_track = None`) so it does not raise a new KeyError once the handler has
already removed the publication.

Fixes livekit#681

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jun 1, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2dbb047422

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread livekit-rtc/livekit/rtc/participant.py Outdated
Comment on lines +805 to +807
publication = self._track_publications.pop(track_sid, None)
if publication is not None:
publication._track = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear the publication track after event-first unpublishes

When local_track_unpublished is processed before the async unpublish_track response, Room._on_room_event now removes the publication from _track_publications first. This defensive pop therefore returns None, skips the only _track = None cleanup, and leaves any LocalTrackPublication reference previously returned to application code reporting a stale non-None track even after await unpublish_track(...) completes. Preserve a reference for the async cleanup or clear the removed publication in the room-event path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

tmkarthi and others added 2 commits June 1, 2026 13:37
… race

Address two issues from review of the local_track_unpublished fix:

1. mypy strict (CI gate) flagged room.py: `pop(sid, None)` was assigned to
   `lpublication`, which is also bound to `track_publications[sid]`
   (non-optional LocalTrackPublication) in sibling branches of the same
   method. mypy fixes the variable's type from that first binding and pushes
   it as the expected return type into `pop`, selecting the
   `pop(key, default: _VT) -> _VT` overload and rejecting None. Use a fresh
   variable with `.get()` + conditional `del`, mirroring the existing
   local_track_republished handler, which already passes the gate.

2. With the handler now removing the publication first in the common
   ordering, unpublish_track's `pop(track_sid, None)` returned None and
   skipped the `_track = None` cleanup, leaving a LocalTrackPublication
   reference held by application code reporting a stale non-None track after
   `await unpublish_track(...)` completed. Capture the publication reference
   before the FFI round-trip so the track is cleared once unpublish
   completes regardless of which path removes it from the dict.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KeyError in _on_room_event for local_track_unpublished during teardown — unchecked dict lookup at room.py:732

2 participants