Izwi

Local-first audio inference engine for TTS, ASR, and voice AI workflows.

Website • Documentation • Releases • Getting Started

Overview

Izwi is a privacy-focused audio AI platform that runs entirely on your machine. No cloud services, no API keys, no data leaving your device.

Core capabilities:

Voice Mode — Real-time voice conversations with AI
Text-to-Speech — Generate natural speech from text
Studio — Build long-form TTS projects and exports
Speech Recognition — Convert audio to text with high accuracy
Speaker Diarization — Identify and separate multiple speakers
Voice Cloning — Clone any voice from a short audio sample
Voice Design — Create custom voices from text descriptions
Forced Alignment — Word-level audio-text alignment
Chat — Text-based AI conversations

The server exposes OpenAI-compatible API routes under /v1. When the server is running, the local Scalar API reference is available at http://localhost:8080/docs, and the raw OpenAPI document is available at http://localhost:8080/openapi.json. Scalar includes navigation entries for preview first-party, operator, and realtime route families; detailed guidance is documented in the API Reference.

Runtime Support Matrix

Backend support depends on both the host and the artifact you install.

macOS on Apple Silicon: Metal is the recommended and stable GPU path.
Linux and Windows GitHub Release artifacts: public commands remain izwi / izwi-server and their Windows .exe counterparts, and are intentionally CPU-only.
Source builds: CUDA is supported when you build with --features cuda on a compatible NVIDIA host.
Docker CUDA profile: the CUDA distribution path for NVIDIA Linux hosts; when building on a machine without nvidia-smi, set CUDA_COMPUTE_CAP for the target GPU architecture.

See the full Runtime Support Matrix.

Quick Install

macOS

Download the latest .dmg from GitHub Releases:

Open the .dmg file
Drag Izwi.app to Applications
Launch Izwi

Linux

wget https://github.com/izwi-ai/izwi/releases/latest/download/izwi_amd64.deb
sudo dpkg -i izwi_amd64.deb

Windows

Download and run the installer from GitHub Releases.

Full installation guides: macOS • Linux • Windows • From Source

Quick Start

1. Start the server

izwi serve

Open http://localhost:8080 in your browser.

API users can also open http://localhost:8080/docs for the local Scalar API reference.

2. Download a model

izwi pull Qwen3-TTS-12Hz-0.6B-Base

3. Generate speech

izwi tts "Hello from Izwi!" --output hello.wav

4. Transcribe audio

izwi pull Parakeet-TDT-0.6B-v3
izwi transcribe audio.wav

Long-form ASR is handled automatically: Izwi now chunks long recordings, stitches overlapping transcripts, and returns a full transcript instead of only the first model window.

Optional tuning knobs:

IZWI_ASR_CHUNK_TARGET_SECS=24
IZWI_ASR_CHUNK_MAX_SECS=30
IZWI_ASR_CHUNK_OVERLAP_SECS=3
# Optional: preload models at server startup to reduce first-request cold latency.
# Comma-separated model IDs (for example Whisper-Large-v3-Turbo,Qwen3.5-4B)
IZWI_PRELOAD_MODELS=Whisper-Large-v3-Turbo
# Optional: run a short synthetic ASR warmup after preloading (enabled by default).
IZWI_WARMUP_PRELOADED_MODELS=1
IZWI_ASR_WARMUP_DURATION_MS=800
# Optional: tune text streaming queue depth when using per-character ASR streaming.
IZWI_STREAM_TEXT_QUEUE_CAPACITY=4096

Anonymous Analytics (Desktop)

Izwi desktop supports optional, opt-in anonymous usage analytics powered by Aptabase.

Disabled by default until users explicitly opt in.
Can be enabled during onboarding or later in Settings.
Users can opt out at any time.
No prompts, transcripts, audio payloads, local paths, or personal identifiers are sent.

To enable analytics transport in the desktop shell, set the app key in the runtime environment:

APTABASE_APP_KEY=A-US-XXXXXXXXXXXXXXX

Use the exact key from Aptabase (for example A-US-... or A-EU-...).

Without this variable, analytics calls are treated as no-op events.

Supported Models

Category	Models
TTS	Qwen3-TTS 12Hz (0.6B Base/CustomVoice, 1.7B Base/CustomVoice/VoiceDesign), Kokoro-82M
ASR	Qwen3-ASR GGUF (0.6B, 1.7B), Parakeet-TDT-0.6B-v3, Whisper-Large-v3-Turbo
Diarization	Sortformer 4-speaker
Chat	Qwen3 GGUF (0.6B, 1.7B, 4B, 8B), Qwen3.5 GGUF (0.8B, 2B, 4B, 9B), LFM2.5 (1.2B Instruct/Thinking GGUF), Gemma 3 (1B)
Audio	LFM2.5-Audio-1.5B-GGUF
Alignment	Qwen3-ForcedAligner-0.6B (full, 4-bit)

Run izwi list to see all available models.

Full model documentation: Models Guide

Documentation

Resource	Link
Getting Started	izwiai.com/docs/getting-started
Installation	izwiai.com/docs/installation
Features	izwiai.com/docs/features
CLI Reference	izwiai.com/docs/cli
API Reference	izwiai.com/docs/api
Models	izwiai.com/docs/models
Local OpenAPI Reference	`http://localhost:8080/docs` when `izwi serve` is running
Troubleshooting	izwiai.com/docs/troubleshooting

License

Apache 2.0

Acknowledgments

Qwen3-TTS by Alibaba
Parakeet by NVIDIA
Gemma by Google
HuggingFace Hub for model hosting

Name		Name	Last commit message	Last commit date
Latest commit History 1,088 Commits
.github/workflows		.github/workflows
benchmarks/manifests		benchmarks/manifests
crates		crates
data		data
docs		docs
images		images
scripts		scripts
tasks		tasks
ui		ui
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
LICENSE		LICENSE
README.md		README.md
app-icon-redesigned.png		app-icon-redesigned.png
app-icon.png		app-icon.png
config.docker.toml		config.docker.toml
config.toml		config.toml
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Izwi

Overview

Runtime Support Matrix

Quick Install

macOS

Linux

Windows

Quick Start

1. Start the server

2. Download a model

3. Generate speech

4. Transcribe audio

Anonymous Analytics (Desktop)

Supported Models

Documentation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Izwi

Overview

Runtime Support Matrix

Quick Install

macOS

Linux

Windows

Quick Start

1. Start the server

2. Download a model

3. Generate speech

4. Transcribe audio

Anonymous Analytics (Desktop)

Supported Models

Documentation

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages