Skip to content

feat(app): add custom vocabulary support for Parakeet CTC boosting#77

Merged
pasrom merged 2 commits intomainfrom
feat/custom-vocabulary
Apr 1, 2026
Merged

feat(app): add custom vocabulary support for Parakeet CTC boosting#77
pasrom merged 2 commits intomainfrom
feat/custom-vocabulary

Conversation

@pasrom
Copy link
Copy Markdown
Owner

@pasrom pasrom commented Apr 1, 2026

Summary

  • Enable domain-specific vocabulary boosting for Parakeet ASR via FluidAudio's CTC keyword spotting
  • After TDT transcription, CTC model runs constrained decoding to detect and correct vocabulary terms
  • Settings: file picker for custom vocabulary (one term per line)
  • Wired through AppSettings → AppState → ParakeetEngine

Inspired by @execsumo's work in #70.

Stacked on #76 (VAD)

Test plan

  • 7 unit tests (defaults, persistence, engine wiring, empty/missing path handling)
  • Build passes
  • Lint clean (0 violations)
  • Manual: create vocab file, select in Settings, transcribe with Parakeet

pasrom added 2 commits April 1, 2026 09:49
Voice Activity Detection removes silence before transcription, improving
accuracy and speed for recordings with significant pauses. Uses FluidAudio's
Silero VAD v6 model (CoreML/ANE) to detect speech regions, then:
- Extracts speech-only audio for transcription
- Remaps timestamps back to original timeline

New types: SpeechRegion, VadSegmentMap (pure, testable), FluidVAD (wrapper).
Settings: vadEnabled (default off), vadThreshold (0.3–0.9 slider).
Pipeline: single-source path preprocesses with VAD when enabled.

Inspired by @execsumo's work in #70.
Enable domain-specific vocabulary boosting for the Parakeet ASR engine
using FluidAudio's CTC keyword spotting pipeline. After TDT transcription,
the CTC model runs constrained decoding to detect and correct vocabulary
terms (e.g. company names, product names) that TDT may have misrecognized.

- Add customVocabularyPath to AppSettings (persisted via UserDefaults)
- Add configureVocabulary() to ParakeetEngine that loads vocab file,
  downloads CTC models, and initializes VocabularyRescorer
- Apply CTC rescoring post-transcription in transcribeSegments()
- Add vocabulary file picker in SettingsView (Parakeet section)
- Wire customVocabularyPath from AppSettings to ParakeetEngine in AppState
- Add CustomVocabularyTests (7 tests: defaults, persistence, engine wiring)
- Update Package.resolved to match main branch (FluidAudio 0.13.4)

Inspired by @execsumo's work in #70.
@github-actions github-actions bot added the enhancement New feature or request label Apr 1, 2026
Base automatically changed from feat/vad-preprocessing to main April 1, 2026 08:00
@pasrom pasrom merged commit 54e330b into main Apr 1, 2026
4 checks passed
@pasrom pasrom deleted the feat/custom-vocabulary branch April 1, 2026 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant