GitHub - avasis-ai/openclaw-voice: Always-listening voice interface for OpenClaw. Wake word → STT → Agent → TTS. Fully local. Your voice, your machine.

Your AI agent, always listening, always local, sounding like you.

Install · Quickstart · How it works · Config

Say the wake word. Ask anything. Your OpenClaw agent responds — in your own voice — while running 100% on-device. No cloud. No API keys. No latency.

This is what happens when your terminal AI gets ears and a mouth.

Why

You already have a powerful AI agent that can read your codebase, run commands, search the web, and manage your machine. But you type to it. Through a text box. Like it's 2023.

openclaw-voice gives your agent a voice interface in 30 lines of config. Wake word detection, speech-to-text, LLM reasoning, and voice-cloned text-to-speech — all running locally on Apple Silicon. Zero ongoing cost.

You: "Hey Jarvis, what's using all my GPU memory right now?"
Agent: "Ollama has gemma4 loaded, using 9.2 GB of your 96 GB VRAM.
       The fine-tuning job on PID 47182 is consuming another 31 GB.
       Want me to kill it?"
You: "Yeah, kill it and pull the new model."
Agent: *kills process, pulls model*
Agent: "Done. Qwen3.5 35B is ready."

Install

pip install openclaw-voice

Prerequisites (macOS, Apple Silicon recommended):

OpenClaw with a configured agent
WhisperKit CLI — brew install whisperkit
Microphone access (macOS will prompt on first run)

Quickstart

# 1. Record your voice sample (32 seconds, 4 sentences)
openclaw-voice record

# 2. Test all subsystems
openclaw-voice test

# 3. Start listening (foreground)
openclaw-voice start

Say your wake word. Ask anything. That's it.

Always-on in background:

openclaw-voice install    # LaunchAgent, auto-starts on login
openclaw-voice logs       # tail the live log
openclaw-voice uninstall  # stop + remove

How it works

┌─────────┐    ┌──────────────┐    ┌───────────┐    ┌──────────┐    ┌─────────┐
│  Mic    │───▶│ openWakeWord │───▶│ WhisperKit│───▶│ OpenClaw │───▶│Chatterbox│──▶ speakers
│ 16kHz   │    │ wake detect  │    │    STT    │    │   Agent  │    │   TTS   │
└─────────┘    └──────────────┘    └───────────┘    └──────────┘    └─────────┘
               <0.5ms/check        Neural Engine     Your models     Voice cloned
               CPU only            Apple Silicon     Local only      from 30s sample

Layer	What	Why
Wake word	openWakeWord	Trainable, CPU-only, zero cloud
STT	WhisperKit large-v3-turbo	Neural Engine accelerated on Apple Silicon
Agent	OpenClaw	Any model, any tool, full machine control
TTS (cloned)	Chatterbox MLX	30s sample → your voice, MPS accelerated
TTS (fallback)	WhisperKit TTS	No sample needed, realtime streaming

Total cost: $0/month. Everything runs on-device. The only network call is to your local OpenClaw gateway (or your configured LLM endpoint).

Configuration

Config lives at ~/.openclaw/voice/config/voice.yaml. Created on first run with sensible defaults.

wake_word:
  model: hey_jarvis       # or hey_mycroft, alexa, or your custom model
  threshold: 0.6          # lower = more sensitive, more false positives
  cooldown_sec: 2.0       # minimum time between wake triggers

stt:
  model: large-v3-turbo   # WhisperKit model (fastest large-class)
  language: en
  max_record_sec: 12      # stop recording after this long
  silence_trigger_sec: 1.5 # stop after this much silence

tts:
  backend: chatterbox     # primary: voice cloning
  fallback_speaker: aiden # fallback if no voice sample
  voice_sample: ~/.openclaw/voice/samples/voice.wav

agent:
  id: main                # your OpenClaw agent ID
  max_reply_chars: 800    # truncate long replies for voice UX

Custom wake word

Train your own "hey " model with ~50 voice samples. See docs/custom-wake-word.md for the full guide.

Quick path: use the openWakeWord training Colab, download the ONNX model, drop it in ~/.openclaw/voice/models/, and update your config.

Tips

Too many false wake-ups? Raise wake_word.threshold to 0.7
Cuts off mid-thought? Raise stt.silence_trigger_sec to 2.5
Want shorter replies? Lower agent.max_reply_chars to 400
No voice sample yet? It falls back to WhisperKit TTS automatically
Replies too slow? Configure your agent to use a faster model for voice mode

Architecture decisions

Why OpenClaw and not a raw LLM API? Because your voice assistant should be able to do things — run commands, read files, search code, manage your machine. OpenClaw gives your agent tools, memory, and multi-model routing. A raw API gives you text.

Why WhisperKit and not whisper.cpp? Neural Engine acceleration. On Apple Silicon, WhisperKit runs STT 2-3x faster than GPU-based whisper.cpp while using less power. For an always-listening daemon, that matters.

Why Chatterbox and not [other TTS]? Voice cloning from a 30-second sample with near-zero quality loss. No other OSS TTS does this well on Apple Silicon. The fallback to WhisperKit TTS means it works even without a sample.

Why openWakeWord? It's the only wake word engine that's truly local (no cloud), trainable (custom models), and runs on CPU with <5% of a single core. Perfect for always-on.

Contributing

Contributions welcome. Areas of particular interest:

Linux support — currently macOS only (WhisperKit + Apple Silicon). Would love PipeWire + Whisper.cpp support.
More TTS backends — Coqui TTS, Bark, XTTS
Streaming responses — start speaking before the full reply is generated
Multi-language — Chatterbox supports multilingual, needs config exposure
Custom wake word training tooling — make it a one-command experience

See CONTRIBUTING.md for guidelines.

License

MIT

Built by Avasis · Part of the OpenClaw ecosystem

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
config		config
docs		docs
src/openclaw_voice		src/openclaw_voice
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why

Install

Quickstart

How it works

Configuration

Custom wake word

Tips

Architecture decisions

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Why

Install

Quickstart

How it works

Configuration

Custom wake word

Tips

Architecture decisions

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages