Add text-to-speech audio generation for blog posts by cduruk · Pull Request #80 · cduruk/offbyone

cduruk · 2025-12-14T21:09:55Z

Summary

Adds ability to generate spoken audio versions of blog posts using Modal + Chatterbox TTS
TypeScript orchestrator handles file discovery, MDX text extraction, and Modal invocation
Python Modal function runs Chatterbox TTS on GPU (A10G) for high-quality speech synthesis

Changes

scripts/generate-audio.ts - TypeScript orchestrator with CLI
scripts/tts/chatterbox_tts.py - Modal function with Chatterbox TTS
package.json - Added generate-audio npm script
AGENTS.md - Documentation for audio generation
.gitignore - Added Python cache files

Usage

npm run generate-audio                    # Generate audio for posts without audio.wav
npm run generate-audio -- --slug my-post  # Generate for specific post(s)
npm run generate-audio -- --force         # Regenerate all posts

Technical Notes

Text is cleaned (removes code blocks, components, special chars) and truncated to ~3000 chars
CUDA errors occur with longer text - limit determined through testing
Cold boot takes ~30 seconds on Modal, then fast generation
Output is WAV format (MP3 conversion planned for future)

Requirements

uv (Python package manager)
Modal account with GPU access

Test plan

Test TTS locally with short text
Test end-to-end with blog post
Verify WAV file is valid audio

🤖 Generated with Claude Code

New feature to generate spoken audio versions of blog posts using Modal + Chatterbox TTS: - Add scripts/generate-audio.ts TypeScript orchestrator - Add scripts/tts/chatterbox_tts.py Modal function with GPU TTS - Add npm run generate-audio command - Document usage in AGENTS.md - Update .gitignore for Python cache files Usage: npm run generate-audio -- --slug post-slug Technical notes: - Uses Chatterbox TTS model on Modal A10G GPU - Text limited to ~3000 chars (CUDA limits) - Outputs WAV format (MP3 conversion planned) - Requires uv and Modal account 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add examples for generating sample audio with custom text. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The TTS app is now deployed for faster cold boots via memory snapshots. Added instructions for redeploying after changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

cduruk and others added 3 commits December 14, 2025 16:09

Expand TTS testing documentation in AGENTS.md

bb3cb36

Add examples for generating sample audio with custom text. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add Modal deployment documentation

9ab9c36

The TTS app is now deployed for faster cold boots via memory snapshots. Added instructions for redeploying after changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text-to-speech audio generation for blog posts#80

Add text-to-speech audio generation for blog posts#80
cduruk wants to merge 3 commits into
mainfrom
feature/tts-audio-generation

cduruk commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cduruk commented Dec 14, 2025

Summary

Changes

Usage

Technical Notes

Requirements

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant