A French language learning tool that transcribes French audio, translates it to English, and generates educational audio explanations with multiple speakers. The tool extracts and explains verbs, vocabulary, and idiomatic expressions from French sentences.
Read the associated blog post here.
- Audio Transcription: Transcribes French audio files using Deepgram's speech recognition
- Translation & Analysis: Translates French sentences to English and extracts:
- Verbs (conjugated forms, infinitives, and meanings)
- Vocabulary words and their meanings
- Idiomatic expressions with literal and contextual meanings
- Educational Dialogue Generation: Creates natural dialogue scripts between two French speakers (Marie and Clément) explaining the French content
- Text-to-Speech: Converts dialogue scripts to audio using Google Gemini TTS with multiple speaker voices
- Audio Processing: Converts TTS audio output to MP3 format using FFmpeg
- Node.js (v18 or higher)
- FFmpeg installed and available in your PATH
- API keys for:
- Deepgram (for transcription)
- OpenAI (for translation)
- Google (for TTS)
- Clone the repository:
git clone <repository-url>
cd demo-explain-french- Install dependencies:
npm install- Create a
.envfile in the root directory with your API keys:
DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
GOOGLE_API_KEY=your_google_api_keyThe main entry point is index.ts. The workflow:
- Transcribes the French audio file (
audio.mp3) - Translates each sentence and extracts linguistic information
- Generates educational dialogue scripts
- Converts scripts to audio narration files
npm run build
node dist/index.jsOr if using a TypeScript runner like tsx:
npx tsx index.tsThe project includes "dummy" functions that use pre-existing JSON files instead of making API calls:
transcribeDummy()- usestranscript.jsoninstead of calling DeepgramtranslateSentencesDummy()- usestranslation.jsoninstead of calling OpenAI
This is useful for testing and development without consuming API credits.
demo-explain-french/
├── index.ts # Main entry point and workflow orchestration
├── transcribe.ts # Audio transcription using Deepgram
├── translate.ts # Translation and linguistic analysis using OpenAI
├── tts.ts # Text-to-speech generation using Google Gemini
├── ffmpeg.ts # Audio format conversion utilities
├── types.ts # TypeScript type definitions
├── audio.mp3 # Input French audio file
├── transcript.json # Pre-generated transcription (for dummy mode)
├── translation.json # Pre-generated translations (for dummy mode)
└── samples/ # Sample narration output files
- Uses Deepgram's
nova-3model for French speech recognition - Extracts sentences with timestamps and word-level details
- Supports speaker diarization and smart formatting
- Uses OpenAI's GPT models with structured output
- Translates French sentences to English
- Extracts:
- Verbs: Conjugated form, infinitive, and meaning
- Vocabulary: Words and their meanings
- Idiomatic Expressions: Expression, literal meaning, and contextual meaning
- Generates educational dialogue between two presenters:
- Marie (voice: Sulafat) - Cheerful and enthusiastic
- Clément (voice: Puck) - Warm and informative
- Includes audio direction tags for natural speech:
[short pause]- Brief pause (~250ms)[cheerfully]- Cheerful, upbeat tone[warmly]- Warm, friendly tone[inhales deeply]- Deep breath before speaking[very slowly for emphasis]- Slow speech for emphasis[English explanation]- English text with French accent
- Uses Google Gemini 2.5 Flash Preview TTS model
- Supports multi-speaker dialogue generation
- Outputs PCM audio data that gets converted to MP3
- Converts raw PCM audio buffers to MP3 format
- Configurable sample rate, bitrate, and codec settings
- Default: 24kHz, mono, 192kbps MP3
The presenters and their voices are defined in tts.ts:
- Marie:
Sulafat - Clément:
Puck
Default audio conversion settings in ffmpeg.ts:
- Sample Rate: 24kHz
- Channels: Mono (1)
- Bitrate: 192kbps
- Codec: libmp3lame
The generated narration files are saved to the audio/ directory (or specified workDir) with the naming pattern:
narration_1.mp3
narration_2.mp3
...
@deepgram/sdk- Deepgram speech recognition API@langchain/openai- OpenAI integration for translation@langchain/google-webauth- Google Gemini TTS integration@google/genai- Google Generative AI SDKdotenv- Environment variable managementzod- Schema validation
MIT