We've successfully implemented comprehensive system audio transcription that can capture and transcribe:
- 🎤 Microphone audio (traditional voice input)
- 🔊 System audio (YouTube, Zoom, music, notifications, any app audio)
- 🎧 Mixed mode (both microphone AND system audio simultaneously)
Purpose: Main service for capturing audio from different sources Key Features:
- Microphone capture using
getUserMedia() - System audio capture using
getDisplayMedia()with audio - Mixed mode combining both streams
- Permission management and error handling
- Audio stream merging and processing
Purpose: React hook for managing system audio transcription state Key Features:
- State management for capture, transcription, and permissions
- Integration with IPC transcription API
- Error handling and status tracking
- Real-time transcription results processing
Purpose: Complete UI for system audio transcription Key Features:
- Audio source selection (mic, system, mixed)
- Permission status indicators
- Real-time status display (capturing, transcribing)
- Help text and usage instructions
- Integration with AccumulativeTranscriptDisplay
Purpose: Comprehensive testing of system audio functionality Key Features:
- Permission testing
- Audio capture testing
- Transcription pipeline testing
- Error scenario testing
Purpose: Main transcription interface with system audio support Key Features:
- Mode switching between IPC and System Audio transcription
- UI toggle for different transcription modes
- Integrated help and status display
- Permission Request: Request microphone and/or screen sharing permissions
- Stream Acquisition:
- Microphone:
navigator.mediaDevices.getUserMedia({ audio: true }) - System Audio:
navigator.mediaDevices.getDisplayMedia({ audio: true })
- Microphone:
- Stream Processing: Merge streams using Web Audio API if in mixed mode
- Audio Worklet: Process audio data using existing
wave-loopback.js - Transcription: Send audio data to IPC transcription API
- Display: Show results in AccumulativeTranscriptDisplay
- Automatic Detection: Check existing permissions on component mount
- Request Flow: Guide users through granting required permissions
- Status Indicators: Visual feedback on permission status
- Error Handling: Graceful fallbacks and user-friendly error messages
- IPC API: Uses existing
window.transcriptionAPI.triggerTranscription() - State Management: Integrates with
useTranscriptStore - UI Components: Reuses
AccumulativeTranscriptDisplay - Electron: Works within existing Electron app architecture
- Select "System Audio" mode
- Grant screen sharing permission
- Play any YouTube video
- Get real-time transcription of video audio
- Select "System Audio" or "Mixed" mode
- Capture meeting audio and/or your microphone
- Real-time meeting transcription
- Works with Spotify, Apple Music, podcasts
- Transcribe lyrics or spoken content from any audio app
- Capture and transcribe system alerts, notifications
- Accessibility feature for hearing-impaired users
- Simultaneously capture system audio AND microphone
- Perfect for content creation, interviews, tutorials
navigator.mediaDevices.getUserMedia()- Microphone accessnavigator.mediaDevices.getDisplayMedia()- System audio via screen sharingWeb Audio API- Stream processing and mergingAudioWorklet- Real-time audio processing
- Requires user permission for both microphone and screen sharing
- Permissions are clearly explained to users
- Graceful degradation if permissions denied
- No persistent storage of audio data
- Reuses existing audio worklet (
wave-loopback.js) - Efficient stream merging using Web Audio API
- Real-time processing without buffering delays
- Minimal UI re-renders with proper state management
- 🟢 Green: Permission granted and working
- 🔴 Red: Permission denied or unavailable
- Clear labels for microphone and system audio status
- 🎤 Microphone: Traditional voice transcription
- 🔊 System Audio: Transcribe any app's audio output
- 🎧 Mixed Mode: Both microphone and system audio
- Capturing indicator with pulse animation
- Transcribing status with processing feedback
- Audio level visualization
- Error messages with helpful suggestions
- Expandable help section explaining each mode
- Permission requirements clearly stated
- Step-by-step usage instructions
- Troubleshooting guidance
The system audio transcription is now fully implemented and ready for testing. Users can:
- Refresh the assistant window to load the new components
- Switch to "System Audio Transcription" mode in TranscriptsPage
- Grant permissions when prompted
- Start transcribing audio from any application!
- ✅ Real-time transcription of YouTube videos
- ✅ Zoom/Teams meeting transcription
- ✅ Music and podcast transcription
- ✅ System notification transcription
- ✅ Mixed mode for simultaneous mic + system audio
- ✅ Integration with existing transcript storage and display