Skip to content

MettaMazza/script-reader

Repository files navigation

🎧 Script Reader

An open-source, local-first audiobook generator powered by Kokoro TTS.

Paste any script, novel, or prose document — Script Reader parses it automatically, assigns voices to characters, and generates a full-length audiobook with chapter-by-chapter WAV output. Everything runs locally on your machine; no cloud APIs, no subscriptions.


✨ Features

  • Multi-format parser — Automatically detects and parses:
    • 📖 Novels / Audiobooks (Markdown prose with inline dialogue)
    • 🎬 Fountain / Screenplay format
    • 💬 Colon-delimited scripts (Character: dialogue)
    • 🏷️ Labeled dialogue ([Character] dialogue)
    • 🔍 Heuristic fallback for unknown formats
  • 24 built-in voices via Kokoro-82M — American & British, male & female
  • Qwen3-TTS support — Optional secondary engine for voice cloning
  • Chapter-aware generation — Novels are split into chapters with per-chapter WAV files + a stitched full audiobook
  • Real-time progress — WebSocket-based live updates during generation
  • Auto voice assignment — Intelligent gender-alternating assignment with manual overrides
  • Speed control — Per-character speech rate multiplier (0.5x – 2.0x)
  • Premium web UI — Dark-mode, glassmorphism design with Inter typography
  • 100% local — No data leaves your machine

🖥️ Screenshots

The web UI runs at http://localhost:8000 and features three panels:

  1. Script Input — Paste or load your manuscript
  2. Voice Assignment — View detected characters, assign voices, preview audio
  3. Generation — Start generation, monitor progress, download output

🚀 Quick Start

Prerequisites

  • Rust ≥ 1.75 (install via rustup)
  • Python ≥ 3.10
  • espeak-ng (required by Kokoro for phonemization)

1. Clone & Build

git clone https://github.com/MettaMazza/script-reader.git
cd script-reader
cargo build --release

2. Set Up Python TTS

python3 -m venv venv
source venv/bin/activate
pip install kokoro soundfile numpy

On macOS, also install espeak-ng:

brew install espeak-ng

3. Run

cargo run --release

Open http://localhost:8000 in your browser.

4. Generate an Audiobook

  1. Paste your manuscript into the Script Input panel
  2. Click Parse Script — characters and chapters are detected automatically
  3. Adjust voice assignments if desired
  4. Click Generate Audio
  5. Download the chapter WAVs or the full stitched audiobook from output/

🏗️ Architecture

script-reader/
├── src/
│   ├── main.rs              # Server entry point (Axum, port 8000)
│   ├── lib.rs               # Router builder (shared with tests)
│   ├── parser.rs            # Multi-format script & novel parser
│   ├── routes.rs            # REST + WebSocket API handlers
│   ├── audio_pipeline.rs    # Chapter-aware generation engine
│   ├── audio_utils.rs       # WAV read/write, silence, concatenation
│   ├── voice_registry.rs    # Character → voice mapping
│   └── tts/
│       ├── mod.rs           # TtsEngine trait + AudioData
│       ├── kokoro.rs        # Kokoro-82M via Python subprocess
│       └── qwen3.rs         # Qwen3-TTS via Python subprocess
├── static/
│   ├── index.html           # Web UI
│   ├── style.css            # Premium dark-mode styles
│   └── app.js               # Client-side JavaScript
├── tests/
│   ├── e2e.rs               # Full API integration tests
│   └── novel_parse.rs       # Novel file integration test
├── output/                  # Generated audio (gitignored)
├── build.sh                 # Build + dependency check script
├── Cargo.toml
└── README.md

Data Flow

Manuscript → Parser → ParsedScript (chapters + elements)
                          ↓
                   VoiceRegistry (character → voice mapping)
                          ↓
              AudioPipeline.generate_book()
                          ↓
         ┌────────────────┼────────────────┐
         ↓                ↓                ↓
    Chapter 1 WAV    Chapter 2 WAV    Chapter N WAV
         └────────────────┼────────────────┘
                          ↓
               full_audiobook.wav

📡 API Reference

All endpoints accept/return JSON unless noted.

Method Endpoint Description
POST /api/parse Parse a script/novel. Body: { "script": "..." }
GET /api/voices List all available voices and engine status
POST /api/assign Set character → voice mappings
GET /api/assignments Get current voice assignments
POST /api/generate Start audio generation (returns immediately)
GET /api/status Get generation progress
POST /api/preview Generate a single-line audio preview
WS /ws/progress Real-time progress updates via WebSocket

Example: Generate via CLI

# Parse
curl -X POST http://localhost:8000/api/parse \
  -H 'Content-Type: application/json' \
  -d '{"script": "'"$(cat my_book.md)"'"}'

# Check voices
curl http://localhost:8000/api/voices | jq

# Override narrator voice
curl -X POST http://localhost:8000/api/assign \
  -H 'Content-Type: application/json' \
  -d '{"assignments": {"NARRATOR": {"engine": "kokoro", "voice_id": "am_michael", "speed": 1.25}}}'

# Generate
curl -X POST http://localhost:8000/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"project_name": "my_audiobook"}'

# Monitor
curl http://localhost:8000/api/status | jq

🎤 Available Voices

American English

ID Name Gender
af_alloy Alloy Female
af_bella Bella Female
af_heart Heart Female
af_jessica Jessica Female
af_nicole Nicole Female
af_nova Nova Female
af_river River Female
af_sarah Sarah Female
af_sky Sky Female
am_adam Adam Male
am_echo Echo Male
am_eric Eric Male
am_fenrir Fenrir Male
am_liam Liam Male
am_michael Michael Male
am_onyx Onyx Male
am_puck Puck Male

British English

ID Name Gender
bf_emma Emma Female
bf_isabella Isabella Female
bf_alice Alice Female
bm_george George Male
bm_lewis Lewis Male
bm_daniel Daniel Male
bm_fable Fable Male

🧪 Testing

# Run all tests (87 unit + 12 e2e + integration)
cargo test

# Run only unit tests
cargo test --lib

# Run only e2e tests
cargo test --test e2e

# Run with output
cargo test -- --nocapture

🔧 Configuration

Environment Variable Default Description
QWEN3_MODEL Qwen/Qwen3-TTS-12Hz-0.6B-Base Qwen3 model name

📋 Roadmap

  • Native ONNX inference (eliminate Python subprocess)
  • MP3/FLAC export
  • Batch chapter generation (parallel synthesis)
  • Voice cloning with reference audio
  • Docker container
  • SSML support for fine-grained prosody control

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

# Development workflow
cargo fmt          # Format code
cargo clippy       # Lint
cargo test         # Run all tests

📄 License

This project is licensed under the MIT License — see LICENSE for details.


🙏 Acknowledgments

  • Kokoro-82M by Hexgrad — the TTS model powering voice synthesis
  • Qwen3-TTS by Alibaba — optional secondary TTS engine
  • Axum — async Rust web framework
  • Hound — WAV encoding/decoding

About

Local-first audiobook generator — parse any script or novel, assign voices, generate full audio with Kokoro TTS. 130 tests. Early beta.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors