-
Meeting Agent: Does exactly what Granola does – captures everything you hear, no bots needed. Transforms messy meeting notes into clean, actionable and shareable action items and MOMs.
-
Sales Agent: Real-time contextual intel like Cluely. It will literally tell you "your prospect mentioned budget concerns at the 15th minute – you missed probing that." and all the phrasing is backed by SPIN, MEDDIC selling techniques. Not generic advice – specific missed opportunities called out in real-time. Saves you from that post-meeting "damn, should have said this" moment.
-
Transparent and Invisible: Yes – it won't show up in screen recording, screen shares and stays persistent across macOS multiple desktops and apps.
- 🚀 Real-time transcription from microphone input
- 📊 Comprehensive benchmarking of different models and configurations
- 🎯 3-second chunk processing for near real-time results
- 🔧 Multiple model sizes (tiny, base, small, medium, large)
- 💻 CPU and GPU support with optimized compute types
- 🎤 VAD (Voice Activity Detection) filtering for better accuracy
# Install Python dependencies
pip install -r requirements.txt
# On macOS, you might need to install portaudio for audio capture
brew install portaudiopython realtime_transcription_test.pyThe script will:
- Let you choose model size (tiny, base, small, medium, large)
- Ask if you want to use GPU (if available)
- Start capturing audio from your microphone
- Transcribe 3-second chunks in real-time
- Display results with processing time
python benchmark_test.pyThis will test all model configurations and show:
- Load times for each model
- Transcription speed (Real-time Factor)
- Best configurations for speed vs accuracy
| Model Size | Speed | Accuracy | Memory | Best For |
|---|---|---|---|---|
| tiny | ⚡⚡⚡⚡⚡ | ⭐⭐ | 39MB | Fastest real-time |
| base | ⚡⚡⚡⚡ | ⭐⭐⭐ | 74MB | Good balance |
| small | ⚡⚡⚡ | ⭐⭐⭐⭐ | 244MB | Better accuracy |
| medium | ⚡⚡ | ⭐⭐⭐⭐⭐ | 769MB | High accuracy |
| large | ⚡ | ⭐⭐⭐⭐⭐ | 1550MB | Highest accuracy |
- tiny model: ~0.3x real-time (very fast)
- base model: ~0.8x real-time (near real-time)
- small model: ~2.0x real-time (slower)
- medium model: ~5.0x real-time (much slower)
- tiny model: ~0.1x real-time (extremely fast)
- base model: ~0.3x real-time (very fast)
- small model: ~0.8x real-time (near real-time)
- medium model: ~1.5x real-time (faster than real-time)
tiny: Fastest, lowest accuracy, good for real-timebase: Good balance of speed and accuracysmall: Better accuracy, still reasonably fastmedium: High accuracy, slowerlarge: Highest accuracy, slowest
int8: Quantized, fastest, works on CPUfloat16: Half precision, good for GPUfloat32: Full precision, slowest
cpu: Works everywhere, slowercuda: GPU acceleration, much faster (requires NVIDIA GPU)
from realtime_transcription_test import RealtimeTranscriber
# Initialize with base model on CPU
transcriber = RealtimeTranscriber(
model_size="base",
device="cpu",
compute_type="int8"
)
# Start transcription
transcriber.start_transcription()# Use GPU for faster processing
transcriber = RealtimeTranscriber(
model_size="small",
device="cuda",
compute_type="float16"
)# Modify the chunk duration in the class
transcriber.chunk_duration = 5.0 # 5 seconds instead of 3
transcriber.chunk_size = int(transcriber.sample_rate * transcriber.chunk_duration)- Python 3.8+
- 4GB RAM
- Microphone access
- Python 3.9+
- 8GB+ RAM
- NVIDIA GPU (for CUDA acceleration)
- SSD storage
# Install audio dependencies
brew install portaudio
# If you get audio permission errors, grant microphone access in System Preferences# Install Visual C++ Build Tools if you get compilation errors
# Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/- macOS: Go to System Preferences > Security & Privacy > Microphone
- Windows: Check microphone permissions in Settings > Privacy > Microphone
- Linux: Ensure your user is in the
audiogroup
# Check if CUDA is available
python -c "import torch; print(torch.cuda.is_available())"
# If not available, install CUDA toolkit or use CPU- Use smaller models (tiny, base) for limited RAM
- Close other applications
- Use
int8quantization on CPU
- Use GPU if available
- Use smaller models
- Reduce chunk duration
- Use
int8quantization
# Modify the audio_callback method to use different audio sources
def custom_audio_callback(self, indata, frames, time, status):
# Process audio from different sources
# e.g., system audio, specific applications, etc.
passfrom faster_whisper import WhisperModel, BatchedInferencePipeline
model = WhisperModel("base", device="cuda", compute_type="float16")
batched_model = BatchedInferencePipeline(model=model)
# Process multiple audio files
segments, info = batched_model.transcribe("audio.mp3", batch_size=16)segments, _ = model.transcribe("audio.mp3", word_timestamps=True)
for segment in segments:
for word in segment.words:
print(f"[{word.start:.2f}s -> {word.end:.2f}s] {word.word}")- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- faster-whisper by SYSTRAN
- OpenAI Whisper for the original model
- CTranslate2 for fast inference