Releases: FluidInference/FluidAudio
Releases · FluidInference/FluidAudio
v0.13.6
What's New in v0.13.6
Features
- Add Japanese ASR support with JSUT and Common Voice datasets (#478)
- Add opt-in embedding skip strategy for offline diarization pipeline (#480)
- Add configurable computeUnits for Kokoro models (#482)
- Adds workaround for iOS 26 ANE compiler regressions
- Allows bypassing ANE with
.cpuAndGPUwhen needed - Backwards compatible (default remains
.all)
Improvements
- Skip error recovery on intentional cancellation (#481)
Documentation
- Add Parakeet EOU ultra-low latency demo video to README
Full Changelog: v0.13.5...v0.13.6
v0.13.5
What's New in v0.13.5
Features
- Add experimental CTC zh-CN Mandarin ASR (8.23% CER on THCHS-30) (#476)
- Add PocketTTS sessions with persistent KV-cache (#471)
- Add PunctuationCommitLayer for punctuation-aware streaming ASR (#466)
Improvements
- Refactor TDT decoder: Extract reusable components (#474)
- ASR architecture cleanup: naming, dead code, file organization (#468)
- Clarify custom vocabulary model compatibility and approach selection (#469)
Bug Fixes
- Fix Swift 6 concurrency errors in SlidingWindowAsrManager (#472, #476)
- Fix use-after-free when mic and system transcription run concurrently (#473)
- Fix fatal error in levenshteinDistance with empty arrays (#476)
Documentation
- Fix stale references in ASR documentation (#462)
- Update Documentation index, remove espeak-ng licenses (#461)
- Clean up CI workflows (#463, #464)
Full Changelog: v0.13.4...v0.13.5
Note: CTC zh-CN is experimental. API may change in future releases.
v0.13.4
v0.13.3
What's Changed
Documentation
- Added comprehensive ASR directory structure documentation explaining old vs new layout, SlidingWindow vs Streaming distinction, and design decisions
- Added main branch baseline benchmarks for PR #440 regression testing (6 models tested on M2 16GB)
- Added PR branch benchmark results showing no regressions across all 6 models:
- TDT v3: 2.6% WER (maintained)
- TDT v2: 3.8% WER (maintained)
- CTC-TDT 110M: 3.6% WER (maintained)
- EOU 320ms: 7.11% WER (maintained)
- Nemotron 1120ms: 1.99% WER (maintained)
- CTC Earnings: 16.51% WER (within noise of 16.54%)
Refactoring
- Deduplicated decoder projection normalization in TdtDecoderV3 by merging prepareDecoderProjection and populatePreparedDecoderProjection into a single normalizeDecoderProjection method (no behavioral change, 2.6% WER maintained)
Full Changelog: v0.13.2.5...v0.13.2.6
v0.13.2.5
What's Changed
Directory Structure Refactoring
- Reorganized ASR directory by model family (Parakeet/, Qwen3/)
- Split Streaming/ into EOU/ and Nemotron/ subdirectories
- Removed Parakeet/Shared/ subdirectory, moved files to Parakeet/ root
- Added StreamingAsrEngine protocol for unified streaming interface
Bug Fixes
- Fixed EOU shape mismatch for short audio by padding mel to expected frame count (#444)
- Fixed EOU chunk sample counts to match computeFlat frame formula
- Fixed race condition in chunk size initialization
- Fixed Kokoro v2 source_noise dtype and distribution (#447)
Dependency Updates
- Updated swift-transformers from 1.2.0 to 1.3.0 (reduced dependencies from 28 to 11) (#439)
Removed
- Removed unsupported Nemotron 80ms/160ms streaming variants
- Marked KittenTTS and Qwen3-TTS as not supported (#437)
Documentation
- Updated file paths in Documentation to match new ASR structure
- Added MimicScribe to showcase (#446)
Full Changelog: v0.13.2...v0.13.2.5
v0.13.2: TDT-CTC-110M
ASR
- Parakeet-TDT-CTC-110M hybrid model (#433) - Fused preprocessor+encoder, CLI:
--model-version tdt-ctc-110m. Closes #383 - CTC decoder with ARPA language model (#436) - Greedy/beam search decoding, 9.4% WER with domain LM. Closes #384
TTS
- Fix Kokoro TTS Archive build failures (#426) - Replace Float16.bitPattern with vImage conversion. Closes #423
Full Changelog: v0.13.1...v0.13.2
v0.13.1: Nemotron Streaming
ASR
- Nemotron Speech Streaming 0.6B (#432) - Streaming ASR with vDSP optimization, 2.12% WER, 6.4x RTFx. Closes #389
Diarization
- Timeline synchronization and LS-EEND finalization (#421) - Unified offline/streaming finalization, Sortformer flush behavior improvements
Full Changelog: v0.13.0...v0.13.1
v0.13.0: new models + code reorg preparation
Full Changelog: v0.12.6...v0.13.0
v0.12.6 - Swift 6 Concurrency Fixes
What's Changed
Swift 6 Concurrency Safety 🔒
- Convert AsrManager to actor (#419) for proper Swift 6 concurrency safety
- Remove
nonisolated(unsafe)workarounds from StreamingAsrManager - Add proper
awaitat all AsrManager call sites - Fixes data race warnings with Xcode 16.4 RC's stricter concurrency enforcement
Performance Improvements ⚡
- Qwen3 ASR ANE Optimization (#410): Audio encoder fp16 conversion for Apple Neural Engine
- Kokoro TTS ANE Optimization (#411): fp16 conversion for better Neural Engine performance
Bug Fixes 🐛
- Fix KokoroTtsManager.initialize() hang on iOS (#418): Resolves initialization deadlock on iOS devices
- Fix missing source_noise input in Kokoro TTS (#412): Correct model input configuration
Breaking Changes ⚠️
External calls to AsrManager methods now require await:
// Before
let manager = AsrManager()
manager.cleanup()
// After
let manager = AsrManager()
await manager.cleanup()Testing
✅ All CI tests pass (13 tests, 0 failures)
Full Changelog: v0.12.5...v0.12.6
v0.12.5
What's New
-
LS-EEND Diarizer (#376): End-to-end streaming diarization with up to 10 speakers
- Five variants (AMI, CALLHOME, DIHARD2/3, VoxConverse) optimized for different scenarios
- 100ms frame updates with 900ms tentative preview
- Unified
DiarizerTimelineAPI shared with Sortformer - CLI commands:
lseendandlseend-benchmark - Full documentation in
Documentation/Diarization/LSEEND.md
-
Parakeet EOU 1280ms (#388): Added support for 1280ms streaming chunk size
API Improvements
- DiarizerTimeline (#402): Make speakers publicly mutable for custom speaker management
- TDT Decoder (#382): Populate tokenDurations for accurate word endTime
Fixes
- G2P Multilingual (#400): Fix multilingual path resolution
- EmbeddingExtractor (#398): Clamp numMasksInChunk to prevent heap-buffer-overflow
- Swift Transformers (#378): Bump minimum to 1.2.0 (trailing comma fix)
Docs
- LS-EEND vs Sortformer (#397): Add enrollment feedback from integration testing
- Model Conversion Guide (#391, #392): Add guide with existing benchmark datasets
- PocketTTS Architecture (#380): Add pipeline architecture comments
- Showcase Updates: Add Audite (#396), Hitoku Draft (#385), update to OpenOats (#399)
- Hugging Face Badge: Update to 800k+ downloads