Use HA streaming response API so TTS speaks sentence-by-sentence instead of waiting for full reply

## Problem

On HA 2026.5.2 with the OpenClaw conversation agent set on an Assist pipeline (Nabu Casa Cloud TTS), TTS playback only begins **after the entire OpenClaw response has been generated**. For multi-sentence answers this means several seconds of silence before any audio plays — completely defeats the purpose of token streaming.

## Root cause

`custom_components/openclaw/conversation.py` does stream tokens internally:

```python
async for chunk in client.async_stream_message(...):
    full_response += chunk
```

but then assembles the full string and hands it to the pipeline as a single blob:

```python
intent_response = intent.IntentResponse(language=user_input.language)
intent_response.async_set_speech(full_response)
```

So the Assist pipeline never sees a stream and can't chunk into sentences for TTS.

## Fix

HA exposes a streaming response API for conversation agents (available since 2025.x, fully landed by 2026.x). Two complementary surfaces:

- `intent_response.async_set_speech_async_iterator(async_iter_of_text_deltas)` — pipeline consumes deltas, chunks into sentences, streams them into the TTS engine.
- For richer flows, `chat_log.async_add_delta_content_stream()` to feed deltas into the chat log so other consumers (history, frontend) see the stream too.

Cloud TTS, Wyoming engines, and the Nabu Casa Cloud TTS engine all consume the streamed sentence chunks today, so the user-visible win is immediate: TTS starts as soon as the first sentence completes.

Suggested change in `_get_response_streaming` (or wherever the streaming branch lives): yield the chunks directly instead of accumulating, and have the caller pass the iterator into `async_set_speech_async_iterator`. Keep the non-streaming fallback path for older HA cores or non-streaming providers.

## Why it matters

The whole point of a personality-driven assistant with a real LLM behind it is fast, conversational back-and-forth. The latency before voice starts is currently 3–8s on longer answers, which makes the assistant feel broken even when generation is fast. Sentence streaming closes the perceived latency to <1s.

Happy to test a patch / open a PR if helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use HA streaming response API so TTS speaks sentence-by-sentence instead of waiting for full reply #30

Problem

Root cause

Fix

Why it matters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Use HA streaming response API so TTS speaks sentence-by-sentence instead of waiting for full reply #30

Description

Problem

Root cause

Fix

Why it matters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions