Skip to content

Add option to play voice replies via tts.speak service and media_player (bypass WebView audio) #29

@slawa19

Description

@slawa19

Description

Currently, openclaw-chat-card attempts to play TTS responses locally within the browser/WebView. Even when using Home Assistant TTS engines (via _speakViaHomeAssistantTts and /api/tts_get_url), the audio is fetched as a blob and played using a local HTML5 Audio element (new Audio().play()).

In the Home Assistant Companion App (Android/iOS), local WebView audio playback is frequently unreliable. It can be blocked by mobile autoplay policies, routed to the wrong audio stream (e.g., call volume instead of media volume), or silently fail.

As a result, users hear no voice output when interacting with the card on mobile, even though the TTS generation is successful.

Proposed Solution

Introduce a configuration option to delegate audio playback entirely to Home Assistant using the native tts.speak service (or action: tts.speak in newer HA versions) targeted at a specific media_player.

This bypasses the unreliable WebView Audio element and uses HA's native audio pipelines, which work flawlessly on mobile and smart speakers.

Suggested Configuration

type: custom:openclaw-chat-card
show_voice_button: true
voice_output_mode: media_player
ha_tts_engine: tts.piper
voice_output_media_player: media_player.my_phone_speaker

Proposed Logic

If voice_output_mode: media_player is set, bypass local speechSynthesis and new Audio() completely. Instead, when a text response is received, make a standard service call via hass.callService:

await this._hass.callService("tts", "speak", {
  entity_id: this._config.ha_tts_engine, // e.g., tts.piper
  media_player_entity_id: this._config.voice_output_media_player,
  message: plainText
});

(Note: Ensure the card pauses voice recognition until the media_player changes its state to idle, or use a fixed timeout if state tracking is too complex).

Why this is essential

  1. Mobile Reliability: tts.speak targets the Companion App's native media player (media_player.mobile_app_...) or any other smart speaker, bypassing all WebView audio restrictions.
  2. Consistency: This perfectly mirrors how native Home Assistant Assist voice pipelines work.
  3. Flexibility: Users can talk into their phone and have the response play on a room speaker (e.g., a Sonos or Echo device).

Actual behavior

  • User speaks into the card on the HA Android App.
  • Card fetches TTS URL from HA.
  • Card tries to play it locally via new Audio().play().
  • Nothing is heard (or playback fails silently due to WebView constraints).

Expected behavior

  • With the new config, the card triggers a tts.speak service call.
  • Home Assistant reliably plays the audio on the specified media_player.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions