Skip to content

Add text-to-speech audio generation for blog posts#80

Open
cduruk wants to merge 3 commits into
mainfrom
feature/tts-audio-generation
Open

Add text-to-speech audio generation for blog posts#80
cduruk wants to merge 3 commits into
mainfrom
feature/tts-audio-generation

Conversation

@cduruk
Copy link
Copy Markdown
Owner

@cduruk cduruk commented Dec 14, 2025

Summary

  • Adds ability to generate spoken audio versions of blog posts using Modal + Chatterbox TTS
  • TypeScript orchestrator handles file discovery, MDX text extraction, and Modal invocation
  • Python Modal function runs Chatterbox TTS on GPU (A10G) for high-quality speech synthesis

Changes

  • scripts/generate-audio.ts - TypeScript orchestrator with CLI
  • scripts/tts/chatterbox_tts.py - Modal function with Chatterbox TTS
  • package.json - Added generate-audio npm script
  • AGENTS.md - Documentation for audio generation
  • .gitignore - Added Python cache files

Usage

npm run generate-audio                    # Generate audio for posts without audio.wav
npm run generate-audio -- --slug my-post  # Generate for specific post(s)
npm run generate-audio -- --force         # Regenerate all posts

Technical Notes

  • Text is cleaned (removes code blocks, components, special chars) and truncated to ~3000 chars
  • CUDA errors occur with longer text - limit determined through testing
  • Cold boot takes ~30 seconds on Modal, then fast generation
  • Output is WAV format (MP3 conversion planned for future)

Requirements

  • uv (Python package manager)
  • Modal account with GPU access

Test plan

  • Test TTS locally with short text
  • Test end-to-end with blog post
  • Verify WAV file is valid audio

🤖 Generated with Claude Code

cduruk and others added 3 commits December 14, 2025 16:09
New feature to generate spoken audio versions of blog posts using Modal + Chatterbox TTS:

- Add scripts/generate-audio.ts TypeScript orchestrator
- Add scripts/tts/chatterbox_tts.py Modal function with GPU TTS
- Add npm run generate-audio command
- Document usage in AGENTS.md
- Update .gitignore for Python cache files

Usage: npm run generate-audio -- --slug post-slug

Technical notes:
- Uses Chatterbox TTS model on Modal A10G GPU
- Text limited to ~3000 chars (CUDA limits)
- Outputs WAV format (MP3 conversion planned)
- Requires uv and Modal account

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add examples for generating sample audio with custom text.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The TTS app is now deployed for faster cold boots via memory snapshots.
Added instructions for redeploying after changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant