An open-source, local-first audiobook generator powered by Kokoro TTS.
Paste any script, novel, or prose document — Script Reader parses it automatically, assigns voices to characters, and generates a full-length audiobook with chapter-by-chapter WAV output. Everything runs locally on your machine; no cloud APIs, no subscriptions.
- Multi-format parser — Automatically detects and parses:
- 📖 Novels / Audiobooks (Markdown prose with inline dialogue)
- 🎬 Fountain / Screenplay format
- 💬 Colon-delimited scripts (
Character: dialogue) - 🏷️ Labeled dialogue (
[Character] dialogue) - 🔍 Heuristic fallback for unknown formats
- 24 built-in voices via Kokoro-82M — American & British, male & female
- Qwen3-TTS support — Optional secondary engine for voice cloning
- Chapter-aware generation — Novels are split into chapters with per-chapter WAV files + a stitched full audiobook
- Real-time progress — WebSocket-based live updates during generation
- Auto voice assignment — Intelligent gender-alternating assignment with manual overrides
- Speed control — Per-character speech rate multiplier (0.5x – 2.0x)
- Premium web UI — Dark-mode, glassmorphism design with Inter typography
- 100% local — No data leaves your machine
The web UI runs at http://localhost:8000 and features three panels:
- Script Input — Paste or load your manuscript
- Voice Assignment — View detected characters, assign voices, preview audio
- Generation — Start generation, monitor progress, download output
- Rust ≥ 1.75 (install via rustup)
- Python ≥ 3.10
- espeak-ng (required by Kokoro for phonemization)
git clone https://github.com/MettaMazza/script-reader.git
cd script-reader
cargo build --releasepython3 -m venv venv
source venv/bin/activate
pip install kokoro soundfile numpyOn macOS, also install espeak-ng:
brew install espeak-ngcargo run --releaseOpen http://localhost:8000 in your browser.
- Paste your manuscript into the Script Input panel
- Click Parse Script — characters and chapters are detected automatically
- Adjust voice assignments if desired
- Click Generate Audio
- Download the chapter WAVs or the full stitched audiobook from
output/
script-reader/
├── src/
│ ├── main.rs # Server entry point (Axum, port 8000)
│ ├── lib.rs # Router builder (shared with tests)
│ ├── parser.rs # Multi-format script & novel parser
│ ├── routes.rs # REST + WebSocket API handlers
│ ├── audio_pipeline.rs # Chapter-aware generation engine
│ ├── audio_utils.rs # WAV read/write, silence, concatenation
│ ├── voice_registry.rs # Character → voice mapping
│ └── tts/
│ ├── mod.rs # TtsEngine trait + AudioData
│ ├── kokoro.rs # Kokoro-82M via Python subprocess
│ └── qwen3.rs # Qwen3-TTS via Python subprocess
├── static/
│ ├── index.html # Web UI
│ ├── style.css # Premium dark-mode styles
│ └── app.js # Client-side JavaScript
├── tests/
│ ├── e2e.rs # Full API integration tests
│ └── novel_parse.rs # Novel file integration test
├── output/ # Generated audio (gitignored)
├── build.sh # Build + dependency check script
├── Cargo.toml
└── README.md
Manuscript → Parser → ParsedScript (chapters + elements)
↓
VoiceRegistry (character → voice mapping)
↓
AudioPipeline.generate_book()
↓
┌────────────────┼────────────────┐
↓ ↓ ↓
Chapter 1 WAV Chapter 2 WAV Chapter N WAV
└────────────────┼────────────────┘
↓
full_audiobook.wav
All endpoints accept/return JSON unless noted.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/parse |
Parse a script/novel. Body: { "script": "..." } |
GET |
/api/voices |
List all available voices and engine status |
POST |
/api/assign |
Set character → voice mappings |
GET |
/api/assignments |
Get current voice assignments |
POST |
/api/generate |
Start audio generation (returns immediately) |
GET |
/api/status |
Get generation progress |
POST |
/api/preview |
Generate a single-line audio preview |
WS |
/ws/progress |
Real-time progress updates via WebSocket |
# Parse
curl -X POST http://localhost:8000/api/parse \
-H 'Content-Type: application/json' \
-d '{"script": "'"$(cat my_book.md)"'"}'
# Check voices
curl http://localhost:8000/api/voices | jq
# Override narrator voice
curl -X POST http://localhost:8000/api/assign \
-H 'Content-Type: application/json' \
-d '{"assignments": {"NARRATOR": {"engine": "kokoro", "voice_id": "am_michael", "speed": 1.25}}}'
# Generate
curl -X POST http://localhost:8000/api/generate \
-H 'Content-Type: application/json' \
-d '{"project_name": "my_audiobook"}'
# Monitor
curl http://localhost:8000/api/status | jq| ID | Name | Gender |
|---|---|---|
af_alloy |
Alloy | Female |
af_bella |
Bella | Female |
af_heart |
Heart | Female |
af_jessica |
Jessica | Female |
af_nicole |
Nicole | Female |
af_nova |
Nova | Female |
af_river |
River | Female |
af_sarah |
Sarah | Female |
af_sky |
Sky | Female |
am_adam |
Adam | Male |
am_echo |
Echo | Male |
am_eric |
Eric | Male |
am_fenrir |
Fenrir | Male |
am_liam |
Liam | Male |
am_michael |
Michael | Male |
am_onyx |
Onyx | Male |
am_puck |
Puck | Male |
| ID | Name | Gender |
|---|---|---|
bf_emma |
Emma | Female |
bf_isabella |
Isabella | Female |
bf_alice |
Alice | Female |
bm_george |
George | Male |
bm_lewis |
Lewis | Male |
bm_daniel |
Daniel | Male |
bm_fable |
Fable | Male |
# Run all tests (87 unit + 12 e2e + integration)
cargo test
# Run only unit tests
cargo test --lib
# Run only e2e tests
cargo test --test e2e
# Run with output
cargo test -- --nocapture| Environment Variable | Default | Description |
|---|---|---|
QWEN3_MODEL |
Qwen/Qwen3-TTS-12Hz-0.6B-Base |
Qwen3 model name |
- Native ONNX inference (eliminate Python subprocess)
- MP3/FLAC export
- Batch chapter generation (parallel synthesis)
- Voice cloning with reference audio
- Docker container
- SSML support for fine-grained prosody control
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
# Development workflow
cargo fmt # Format code
cargo clippy # Lint
cargo test # Run all testsThis project is licensed under the MIT License — see LICENSE for details.
- Kokoro-82M by Hexgrad — the TTS model powering voice synthesis
- Qwen3-TTS by Alibaba — optional secondary TTS engine
- Axum — async Rust web framework
- Hound — WAV encoding/decoding