Demo: Explain French

A French language learning tool that transcribes French audio, translates it to English, and generates educational audio explanations with multiple speakers. The tool extracts and explains verbs, vocabulary, and idiomatic expressions from French sentences.

Read the associated blog post here.

Features

Audio Transcription: Transcribes French audio files using Deepgram's speech recognition
Translation & Analysis: Translates French sentences to English and extracts:
- Verbs (conjugated forms, infinitives, and meanings)
- Vocabulary words and their meanings
- Idiomatic expressions with literal and contextual meanings
Educational Dialogue Generation: Creates natural dialogue scripts between two French speakers (Marie and Clément) explaining the French content
Text-to-Speech: Converts dialogue scripts to audio using Google Gemini TTS with multiple speaker voices
Audio Processing: Converts TTS audio output to MP3 format using FFmpeg

Prerequisites

Node.js (v18 or higher)
FFmpeg installed and available in your PATH
API keys for:
- Deepgram (for transcription)
- OpenAI (for translation)
- Google (for TTS)

Installation

Clone the repository:

git clone <repository-url>
cd demo-explain-french

Install dependencies:

npm install

Create a .env file in the root directory with your API keys:

DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
GOOGLE_API_KEY=your_google_api_key

Usage

The main entry point is index.ts. The workflow:

Transcribes the French audio file (audio.mp3)
Translates each sentence and extracts linguistic information
Generates educational dialogue scripts
Converts scripts to audio narration files

Running the Project

npm run build
node dist/index.js

Or if using a TypeScript runner like tsx:

npx tsx index.ts

Dummy Mode

The project includes "dummy" functions that use pre-existing JSON files instead of making API calls:

transcribeDummy() - uses transcript.json instead of calling Deepgram
translateSentencesDummy() - uses translation.json instead of calling OpenAI

This is useful for testing and development without consuming API credits.

Project Structure

demo-explain-french/
├── index.ts              # Main entry point and workflow orchestration
├── transcribe.ts         # Audio transcription using Deepgram
├── translate.ts          # Translation and linguistic analysis using OpenAI
├── tts.ts                # Text-to-speech generation using Google Gemini
├── ffmpeg.ts             # Audio format conversion utilities
├── types.ts              # TypeScript type definitions
├── audio.mp3             # Input French audio file
├── transcript.json       # Pre-generated transcription (for dummy mode)
├── translation.json      # Pre-generated translations (for dummy mode)
└── samples/              # Sample narration output files

How It Works

1. Transcription (`transcribe.ts`)

Uses Deepgram's nova-3 model for French speech recognition
Extracts sentences with timestamps and word-level details
Supports speaker diarization and smart formatting

2. Translation (`translate.ts`)

Uses OpenAI's GPT models with structured output
Translates French sentences to English
Extracts:
- Verbs: Conjugated form, infinitive, and meaning
- Vocabulary: Words and their meanings
- Idiomatic Expressions: Expression, literal meaning, and contextual meaning

3. Script Generation (`tts.ts`)

Generates educational dialogue between two presenters:
- Marie (voice: Sulafat) - Cheerful and enthusiastic
- Clément (voice: Puck) - Warm and informative
Includes audio direction tags for natural speech:
- [short pause] - Brief pause (~250ms)
- [cheerfully] - Cheerful, upbeat tone
- [warmly] - Warm, friendly tone
- [inhales deeply] - Deep breath before speaking
- [very slowly for emphasis] - Slow speech for emphasis
- [English explanation] - English text with French accent

4. Text-to-Speech (`tts.ts`)

Uses Google Gemini 2.5 Flash Preview TTS model
Supports multi-speaker dialogue generation
Outputs PCM audio data that gets converted to MP3

5. Audio Conversion (`ffmpeg.ts`)

Converts raw PCM audio buffers to MP3 format
Configurable sample rate, bitrate, and codec settings
Default: 24kHz, mono, 192kbps MP3

Configuration

TTS Voices

The presenters and their voices are defined in tts.ts:

Marie: Sulafat
Clément: Puck

Audio Settings

Default audio conversion settings in ffmpeg.ts:

Sample Rate: 24kHz
Channels: Mono (1)
Bitrate: 192kbps
Codec: libmp3lame

Output

The generated narration files are saved to the audio/ directory (or specified workDir) with the naming pattern:

narration_1.mp3
narration_2.mp3
...

Dependencies

@deepgram/sdk - Deepgram speech recognition API
@langchain/openai - OpenAI integration for translation
@langchain/google-webauth - Google Gemini TTS integration
@google/genai - Google Generative AI SDK
dotenv - Environment variable management
zod - Schema validation

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demo: Explain French

Features

Prerequisites

Installation

Usage

Running the Project

Dummy Mode

Project Structure

How It Works

1. Transcription (`transcribe.ts`)

2. Translation (`translate.ts`)

3. Script Generation (`tts.ts`)

4. Text-to-Speech (`tts.ts`)

5. Audio Conversion (`ffmpeg.ts`)

Configuration

TTS Voices

Audio Settings

Output

Dependencies

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
samples		samples
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
audio.mp3		audio.mp3
ffmpeg.ts		ffmpeg.ts
index.ts		index.ts
package-lock.json		package-lock.json
package.json		package.json
transcribe.ts		transcribe.ts
transcript.json		transcript.json
translate.ts		translate.ts
translation.json		translation.json
tts.ts		tts.ts
types.ts		types.ts

CodeWithOz/french-listening-demo

Folders and files

Latest commit

History

Repository files navigation

Demo: Explain French

Features

Prerequisites

Installation

Usage

Running the Project

Dummy Mode

Project Structure

How It Works

1. Transcription (transcribe.ts)

2. Translation (translate.ts)

3. Script Generation (tts.ts)

4. Text-to-Speech (tts.ts)

5. Audio Conversion (ffmpeg.ts)

Configuration

TTS Voices

Audio Settings

Output

Dependencies

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Transcription (`transcribe.ts`)

2. Translation (`translate.ts`)

3. Script Generation (`tts.ts`)

4. Text-to-Speech (`tts.ts`)

5. Audio Conversion (`ffmpeg.ts`)

Packages