Skip to content

elbruno/cs-gentranscript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎧 Generate Transcript CLI

A .NET 8 file-based console application that transcribes audio files locally using Azure AI Foundry Local and formats the output using GitHub Copilot.

Transcript Generator Foundry Local License

🚀 Features

  • Local AI Transcription - Uses Foundry Local with Whisper models (no cloud required)
  • Multiple Audio Formats - Supports MP3, WAV, M4A, AAC, FLAC, OGG, WMA
  • Language Selection - Transcribe in Spanish, English, French, German, Portuguese, Italian or auto-detect
  • Smart Audio Chunking - Automatically splits long audio files for reliable transcription
  • Multiple Output Formats:
    • Time-Range Format (default)
    • Zencastr Format
    • SRT (SubRip) Subtitles
  • Interactive CLI - Beautiful terminal UI with Spectre.Console
  • Auto-detection - Finds audio files in current directory

📦 Prerequisites

1. Install .NET 8 SDK

Download from: https://dotnet.microsoft.com/download/dotnet/8.0

dotnet --version

2. Install Azure AI Foundry Local

Follow the official documentation:

Verify installation:

foundry --version
foundry model list

3. Install FFmpeg (Required)

FFmpeg is used for audio format conversion and chunking:

Windows (winget):

winget install ffmpeg

Windows (Chocolatey):

choco install ffmpeg

macOS:

brew install ffmpeg

4. Install GitHub Copilot CLI (Optional)

For transcript formatting with AI:

# Install from: https://docs.github.com/en/copilot/github-copilot-in-the-cli
copilot auth login

▶️ Running the App

dotnet generatetranscript.cs

Or with a specific audio file:

# Place your audio file in the same directory
dotnet generatetranscript.cs

🖥️ App Flow

  1. Startup - Displays banner with Spectre.Console
  2. Audio Selection - Auto-detects or prompts for audio file
  3. Audio Processing - Converts to WAV and splits into 5-minute chunks
  4. Language Selection - Choose audio language (Spanish, English, French, etc.)
  5. Format Selection - Choose output format (Time-Range, Zencastr, SRT)
  6. Transcription - Uses Foundry Local Whisper models via OpenAI SDK with language parameter
  7. Formatting - Optionally formats with GitHub Copilot
  8. Output - Saves transcript and shows preview

📝 Output Formats

Time-Range Format

00:00:00 - 00:00:15
Hello and welcome to the show.

00:00:15 - 00:00:30
Today we'll be discussing AI transcription.

Zencastr Format

00:00.00 Speaker 1: Hello and welcome to the show.
00:15.00 Speaker 1: Today we'll be discussing AI transcription.

SRT Format

1
00:00:00,000 --> 00:00:15,000
Hello and welcome to the show.

2
00:00:15,000 --> 00:00:30,000
Today we'll be discussing AI transcription.

🗂️ Supported Audio Formats

Format Extension
MP3 .mp3
WAV .wav
M4A .m4a
AAC .aac
FLAC .flac
OGG .ogg
WMA .wma

🔧 Available Whisper Models

The app automatically selects the best available model:

Model Size Best For
whisper-large-v3-turbo ~3GB Best quality, multilingual
whisper-medium ~1.5GB Good balance
whisper-small ~500MB Faster processing
whisper-base ~150MB Quick transcription
whisper-tiny ~75MB Testing only

Download a model:

foundry model download whisper-medium

⚠️ Known Limitations

  1. Long Audio Files: Foundry Local Whisper may stop early on very long files. The app automatically chunks audio into 5-minute segments to work around this.

  2. Music Detection: Files starting with music may cause issues. The chunking approach helps mitigate this.

  3. Language Support: When specifying a language, the app uses the OpenAI SDK with Foundry Local's web service API to ensure proper transcription in the original language (not translation). Larger models (medium, large-v3-turbo) work better for non-English audio.

  4. No Timestamps: Current Foundry Local SDK doesn't return word-level timestamps.

🐛 Troubleshooting

"Transcription returned empty result"

  • Ensure FFmpeg is installed and in PATH
  • Try a different Whisper model: foundry model download whisper-medium
  • Check if audio file is valid: ffprobe yourfile.mp3

"Cannot detect audio stream format"

  • Convert audio to WAV manually: ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav

Copilot formatting not working

  • Verify: copilot auth status
  • The app will fall back to raw transcript if Copilot is unavailable

📚 References

📄 License

MIT License - See LICENSE file for details.

👤 Author

Bruno Capuano


Happy transcribing 🎙️

About

Audio transcription CLI using Azure AI Foundry Local and GitHub Copilot

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages