Skip to content

fix(anthropic): keep audio flowing when <thinking> tags split across stream deltas#5794

Open
ATOM00blue wants to merge 1 commit into
livekit:mainfrom
ATOM00blue:fix/audio-not-published-with-function-tool
Open

fix(anthropic): keep audio flowing when <thinking> tags split across stream deltas#5794
ATOM00blue wants to merge 1 commit into
livekit:mainfrom
ATOM00blue:fix/audio-not-published-with-function-tool

Conversation

@ATOM00blue
Copy link
Copy Markdown

Summary

Agent audio stopped reaching the room whenever any function_tool was attached to an Agent using the Anthropic LLM, even a trivial no-arg tool. With tools=[] audio worked fine.

The Anthropic plugin strips Claude's <thinking>…</thinking> chain-of-thought, which the model only emits when tools are supplied. The stripping logic inspected each streamed text delta on its own: it began ignoring text on a delta starting with <thinking> and only stopped if a single delta contained </thinking>. Since tokens stream piecemeal, the closing tag usually arrives split across deltas (e.g. "</" then "thinking>"), so the parser stayed stuck and dropped every remaining text delta. The assistant's actual reply never reached TTS, so no audio was synthesized/published β€” matching the report (LLM fires, TTS shows audio duration, but nothing plays).

This replaces the fragile per-delta check with a small stateful filter that scans across deltas, strips complete <thinking> spans, can never get permanently stuck, and never drops text that merely looks like a partial tag.

Test plan

  • New unit tests for the streaming tag filter (split opening/closing tags, leading whitespace, tag-like text, dangling partial tag)
  • New integration test driving LLMStream._parse_event end-to-end proves the answer is emitted when the closing tag is split (fails before, passes after)
  • pytest tests/test_plugin_anthropic.py β€” 14 passed
  • pytest tests/test_agent_session.py β€” 27 passed (no regressions)
  • ruff check, ruff format --check, and mypy clean

Fixes #5617

…deltas

When tools are attached, Claude can wrap chain-of-thought in <thinking>
tags. The previous stripping logic only checked each streamed delta in
isolation, so a closing </thinking> tag split across deltas (e.g. "</"
then "thinking>") left the parser stuck ignoring all remaining text. The
assistant's actual reply was dropped, so TTS received no text and no
audio was published whenever a function_tool was present.

Replace the per-delta check with a stateful filter that scans across
deltas, strips complete <thinking> spans, never gets stuck, and never
drops text that only resembles a partial tag.
Copilot AI review requested due to automatic review settings May 21, 2026 04:30
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines +112 to +121
def flush(self) -> str:
"""Return any buffered text that is not part of a thinking span."""
if self._inside:
self._buf = ""
return ""

# a dangling partial opening tag never completed: it was real text
out = self._buf
self._buf = ""
return out
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 _ThinkingTagFilter.flush() does not reset _inside flag, causing silent text loss across content blocks

When flush() is called at content_block_stop while _inside is True (i.e., a <thinking> tag was opened but never closed within that block), the method clears _buf but leaves _inside = True. Any subsequent text content blocks in the same stream will have ALL their text silently dropped because push() still treats incoming text as part of a thinking span.

Reproduction trace showing text permanently lost
f = _ThinkingTagFilter()
f.push('<thinking>reasoning...')  # _inside becomes True
f.flush()  # content_block_stop: clears _buf but _inside stays True

# Subsequent text block
f.push('The actual answer')  # returns '' β€” silently dropped!
f.flush()  # returns '' β€” answer permanently lost

While this requires the model to emit an unclosed <thinking> tag (uncommon in practice), the consequence is severe when triggered: the assistant's spoken answer is entirely suppressed, producing silence from TTS. The fix should reset self._inside = False inside flush() (or at least when called from the content_block_stop handler at llm.py:467).

Suggested change
def flush(self) -> str:
"""Return any buffered text that is not part of a thinking span."""
if self._inside:
self._buf = ""
return ""
# a dangling partial opening tag never completed: it was real text
out = self._buf
self._buf = ""
return out
def flush(self) -> str:
"""Return any buffered text that is not part of a thinking span."""
if self._inside:
self._buf = ""
self._inside = False
return ""
# a dangling partial opening tag never completed: it was real text
out = self._buf
self._buf = ""
return out
Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Audio not published to room when any function_tool is attached to Agent (livekit-plugins-anthropic 1.5.6)

3 participants