Skip to content

feat: add NVIDIA Parakeet TDT v3 as alternative ASR engine#51

Merged
pasrom merged 3 commits intomainfrom
feat/parakeet-asr-engine
Mar 21, 2026
Merged

feat: add NVIDIA Parakeet TDT v3 as alternative ASR engine#51
pasrom merged 3 commits intomainfrom
feat/parakeet-asr-engine

Conversation

@pasrom
Copy link
Copy Markdown
Owner

@pasrom pasrom commented Mar 20, 2026

Summary

  • Add TranscribingEngine protocol to abstract ASR backends (WhisperKit, Parakeet)
  • Add ParakeetEngine wrapping FluidAudio's CoreML-based Parakeet TDT v3 (~50 MB model, ~10× faster than Whisper Large v3, 25 European languages)
  • Add engine picker in Settings to switch between WhisperKit and Parakeet
  • Normalize all recorded audio to 16kHz at capture time (removes redundant resampling in pipeline)

Test plan

  • 597 tests passing (559 → 597), 0 failures
  • New ParakeetEngineTests (13 unit tests: state, conformance, merge, dedup)
  • New TranscribingEngineTests (14 tests: mergeDualSourceSegments protocol extension)
  • New ParakeetE2ETests (6 tests: model load, transcription, German content)
  • AppStateTests +5 engine switching tests
  • Manual: switch engine in Settings, verify model downloads and transcription works
  • Manual: verify dual-source recording works with both engines

Closes #47

@github-actions github-actions bot added the enhancement New feature or request label Mar 20, 2026
@pasrom pasrom force-pushed the feat/parakeet-asr-engine branch 6 times, most recently from 4e3399e to 6a6e8b6 Compare March 20, 2026 20:45
pasrom added 3 commits March 20, 2026 22:26
…gine

Introduce `TranscribingEngine` protocol to abstract ASR backends, allowing
users to switch between WhisperKit and Parakeet in Settings. Parakeet runs
via FluidAudio CoreML (~50 MB model, ~10× faster than Whisper Large v3,
25 European languages).

- Add `TranscribingEngine` protocol with `mergeDualSourceSegments` default impl
- Add `ParakeetEngine` wrapping FluidAudio's `AsrManager`/`AsrModels`
- Conform `WhisperKitEngine` to protocol, move merge logic to extension
- Add `TranscriptionEngineSetting` enum + engine picker in SettingsView
- Wire `activeTranscriptionEngine` through AppState → PipelineQueue
- Auto-load selected engine model on app launch
- Update 559 tests (0 failures)

Closes #47
- ParakeetEngineTests (13): initial state, protocol conformance, merge, dedup
- TranscribingEngineTests (14): mergeDualSourceSegments labels, sorting, delay, empty inputs
- ParakeetE2ETests (6): model load, transcription with fixture, German content
- AppStateTests +5: engine switching (activeTranscriptionEngine, pipeline wiring)

Total: 559 → 597 tests, 0 failures
- Add TranscribingEngine protocol and ParakeetEngine to project structure
- Update pipeline diagrams to show [WhisperKit | Parakeet] engine selection
- Add engine comparison table (languages, model size, speed)
- Update recording pipeline to reflect 16kHz capture-time normalization
- Update test counts (597)
@pasrom pasrom force-pushed the feat/parakeet-asr-engine branch from 6a6e8b6 to 22aa1e1 Compare March 20, 2026 21:30
@pasrom pasrom merged commit 233f881 into main Mar 21, 2026
6 checks passed
@pasrom pasrom deleted the feat/parakeet-asr-engine branch March 21, 2026 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Please integrate NVIDIA's Parakeet v3 model.

1 participant