🎧 Script Reader

An open-source, local-first audiobook generator powered by Kokoro TTS.

Paste any script, novel, or prose document — Script Reader parses it automatically, assigns voices to characters, and generates a full-length audiobook with chapter-by-chapter WAV output. Everything runs locally on your machine; no cloud APIs, no subscriptions.

✨ Features

Multi-format parser — Automatically detects and parses:
- 📖 Novels / Audiobooks (Markdown prose with inline dialogue)
- 🎬 Fountain / Screenplay format
- 💬 Colon-delimited scripts (Character: dialogue)
- 🏷️ Labeled dialogue ([Character] dialogue)
- 🔍 Heuristic fallback for unknown formats
24 built-in voices via Kokoro-82M — American & British, male & female
Qwen3-TTS support — Optional secondary engine for voice cloning
Chapter-aware generation — Novels are split into chapters with per-chapter WAV files + a stitched full audiobook
Real-time progress — WebSocket-based live updates during generation
Auto voice assignment — Intelligent gender-alternating assignment with manual overrides
Speed control — Per-character speech rate multiplier (0.5x – 2.0x)
Premium web UI — Dark-mode, glassmorphism design with Inter typography
100% local — No data leaves your machine

🖥️ Screenshots

The web UI runs at http://localhost:8000 and features three panels:

Script Input — Paste or load your manuscript
Voice Assignment — View detected characters, assign voices, preview audio
Generation — Start generation, monitor progress, download output

🚀 Quick Start

Prerequisites

Rust ≥ 1.75 (install via rustup)
Python ≥ 3.10
espeak-ng (required by Kokoro for phonemization)

1. Clone & Build

git clone https://github.com/MettaMazza/script-reader.git
cd script-reader
cargo build --release

2. Set Up Python TTS

python3 -m venv venv
source venv/bin/activate
pip install kokoro soundfile numpy

On macOS, also install espeak-ng:

brew install espeak-ng

3. Run

cargo run --release

Open http://localhost:8000 in your browser.

4. Generate an Audiobook

Paste your manuscript into the Script Input panel
Click Parse Script — characters and chapters are detected automatically
Adjust voice assignments if desired
Click Generate Audio
Download the chapter WAVs or the full stitched audiobook from output/

🏗️ Architecture

script-reader/
├── src/
│   ├── main.rs              # Server entry point (Axum, port 8000)
│   ├── lib.rs               # Router builder (shared with tests)
│   ├── parser.rs            # Multi-format script & novel parser
│   ├── routes.rs            # REST + WebSocket API handlers
│   ├── audio_pipeline.rs    # Chapter-aware generation engine
│   ├── audio_utils.rs       # WAV read/write, silence, concatenation
│   ├── voice_registry.rs    # Character → voice mapping
│   └── tts/
│       ├── mod.rs           # TtsEngine trait + AudioData
│       ├── kokoro.rs        # Kokoro-82M via Python subprocess
│       └── qwen3.rs         # Qwen3-TTS via Python subprocess
├── static/
│   ├── index.html           # Web UI
│   ├── style.css            # Premium dark-mode styles
│   └── app.js               # Client-side JavaScript
├── tests/
│   ├── e2e.rs               # Full API integration tests
│   └── novel_parse.rs       # Novel file integration test
├── output/                  # Generated audio (gitignored)
├── build.sh                 # Build + dependency check script
├── Cargo.toml
└── README.md

Data Flow

Manuscript → Parser → ParsedScript (chapters + elements)
                          ↓
                   VoiceRegistry (character → voice mapping)
                          ↓
              AudioPipeline.generate_book()
                          ↓
         ┌────────────────┼────────────────┐
         ↓                ↓                ↓
    Chapter 1 WAV    Chapter 2 WAV    Chapter N WAV
         └────────────────┼────────────────┘
                          ↓
               full_audiobook.wav

📡 API Reference

All endpoints accept/return JSON unless noted.

Method	Endpoint	Description
`POST`	`/api/parse`	Parse a script/novel. Body: `{ "script": "..." }`
`GET`	`/api/voices`	List all available voices and engine status
`POST`	`/api/assign`	Set character → voice mappings
`GET`	`/api/assignments`	Get current voice assignments
`POST`	`/api/generate`	Start audio generation (returns immediately)
`GET`	`/api/status`	Get generation progress
`POST`	`/api/preview`	Generate a single-line audio preview
`WS`	`/ws/progress`	Real-time progress updates via WebSocket

Example: Generate via CLI

# Parse
curl -X POST http://localhost:8000/api/parse \
  -H 'Content-Type: application/json' \
  -d '{"script": "'"$(cat my_book.md)"'"}'

# Check voices
curl http://localhost:8000/api/voices | jq

# Override narrator voice
curl -X POST http://localhost:8000/api/assign \
  -H 'Content-Type: application/json' \
  -d '{"assignments": {"NARRATOR": {"engine": "kokoro", "voice_id": "am_michael", "speed": 1.25}}}'

# Generate
curl -X POST http://localhost:8000/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"project_name": "my_audiobook"}'

# Monitor
curl http://localhost:8000/api/status | jq

🎤 Available Voices

American English

ID	Name	Gender
`af_alloy`	Alloy	Female
`af_bella`	Bella	Female
`af_heart`	Heart	Female
`af_jessica`	Jessica	Female
`af_nicole`	Nicole	Female
`af_nova`	Nova	Female
`af_river`	River	Female
`af_sarah`	Sarah	Female
`af_sky`	Sky	Female
`am_adam`	Adam	Male
`am_echo`	Echo	Male
`am_eric`	Eric	Male
`am_fenrir`	Fenrir	Male
`am_liam`	Liam	Male
`am_michael`	Michael	Male
`am_onyx`	Onyx	Male
`am_puck`	Puck	Male

British English

ID	Name	Gender
`bf_emma`	Emma	Female
`bf_isabella`	Isabella	Female
`bf_alice`	Alice	Female
`bm_george`	George	Male
`bm_lewis`	Lewis	Male
`bm_daniel`	Daniel	Male
`bm_fable`	Fable	Male

🧪 Testing

# Run all tests (87 unit + 12 e2e + integration)
cargo test

# Run only unit tests
cargo test --lib

# Run only e2e tests
cargo test --test e2e

# Run with output
cargo test -- --nocapture

🔧 Configuration

Environment Variable	Default	Description
`QWEN3_MODEL`	`Qwen/Qwen3-TTS-12Hz-0.6B-Base`	Qwen3 model name

📋 Roadmap

Native ONNX inference (eliminate Python subprocess)
MP3/FLAC export
Batch chapter generation (parallel synthesis)
Voice cloning with reference audio
Docker container
SSML support for fine-grained prosody control

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

# Development workflow
cargo fmt          # Format code
cargo clippy       # Lint
cargo test         # Run all tests

📄 License

This project is licensed under the MIT License — see LICENSE for details.

🙏 Acknowledgments

Kokoro-82M by Hexgrad — the TTS model powering voice synthesis
Qwen3-TTS by Alibaba — optional secondary TTS engine
Axum — async Rust web framework
Hound — WAV encoding/decoding

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
agents		agents
src		src
static		static
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
design_voices.py		design_voices.py
generate_audiobook.py		generate_audiobook.py
generate_audiobook_qwen3.py		generate_audiobook_qwen3.py
generate_audiobook_qwen3_cloned.py		generate_audiobook_qwen3_cloned.py
run_both.sh		run_both.sh
test_voices.py		test_voices.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎧 Script Reader

✨ Features

🖥️ Screenshots

🚀 Quick Start

Prerequisites

1. Clone & Build

2. Set Up Python TTS

3. Run

4. Generate an Audiobook

🏗️ Architecture

Data Flow

📡 API Reference

Example: Generate via CLI

🎤 Available Voices

American English

British English

🧪 Testing

🔧 Configuration

📋 Roadmap

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎧 Script Reader

✨ Features

🖥️ Screenshots

🚀 Quick Start

Prerequisites

1. Clone & Build

2. Set Up Python TTS

3. Run

4. Generate an Audiobook

🏗️ Architecture

Data Flow

📡 API Reference

Example: Generate via CLI

🎤 Available Voices

American English

British English

🧪 Testing

🔧 Configuration

📋 Roadmap

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages