Releases · FluidInference/FluidAudio

04 Apr 17:48

Alex-Wengg

v0.13.6

57551cd

v0.13.6 Latest

Latest

What's New in v0.13.6

Features

Add Japanese ASR support with JSUT and Common Voice datasets (#478)
Add opt-in embedding skip strategy for offline diarization pipeline (#480)
Add configurable computeUnits for Kokoro models (#482)
- Adds workaround for iOS 26 ANE compiler regressions
- Allows bypassing ANE with .cpuAndGPU when needed
- Backwards compatible (default remains .all)

Improvements

Skip error recovery on intentional cancellation (#481)

Documentation

Add Parakeet EOU ultra-low latency demo video to README

Full Changelog: v0.13.5...v0.13.6

Assets 2

03 Apr 03:25

Alex-Wengg

v0.13.5

6c40eca

v0.13.5

What's New in v0.13.5

Features

Add experimental CTC zh-CN Mandarin ASR (8.23% CER on THCHS-30) (#476)
Add PocketTTS sessions with persistent KV-cache (#471)
Add PunctuationCommitLayer for punctuation-aware streaming ASR (#466)

Improvements

Refactor TDT decoder: Extract reusable components (#474)
ASR architecture cleanup: naming, dead code, file organization (#468)
Clarify custom vocabulary model compatibility and approach selection (#469)

Bug Fixes

Fix Swift 6 concurrency errors in SlidingWindowAsrManager (#472, #476)
Fix use-after-free when mic and system transcription run concurrently (#473)
Fix fatal error in levenshteinDistance with empty arrays (#476)

Documentation

Fix stale references in ASR documentation (#462)
Update Documentation index, remove espeak-ng licenses (#461)
Clean up CI workflows (#463, #464)

Full Changelog: v0.13.4...v0.13.5

Note: CTC zh-CN is experimental. API may change in future releases.

Assets 2

29 Mar 03:45

Alex-Wengg

v0.13.4

d9eef86

v0.13.4

Changes since v0.13.3

Add standalone CTC head for custom vocabulary (#435, #450)
Make parakeetTdtCtc110m folderName consistent with other Parakeet models (#453)
Replace swift-transformers with minimal BPE tokenizer (#449)
Add RTFx tracking and validation to all benchmark workflows (#458)

Assets 2

28 Mar 06:02

Alex-Wengg

v0.13.2.6

f3dba78

v0.13.3

What's Changed

Documentation

Added comprehensive ASR directory structure documentation explaining old vs new layout, SlidingWindow vs Streaming distinction, and design decisions
Added main branch baseline benchmarks for PR #440 regression testing (6 models tested on M2 16GB)
Added PR branch benchmark results showing no regressions across all 6 models:
- TDT v3: 2.6% WER (maintained)
- TDT v2: 3.8% WER (maintained)
- CTC-TDT 110M: 3.6% WER (maintained)
- EOU 320ms: 7.11% WER (maintained)
- Nemotron 1120ms: 1.99% WER (maintained)
- CTC Earnings: 16.51% WER (within noise of 16.54%)

Refactoring

Deduplicated decoder projection normalization in TdtDecoderV3 by merging prepareDecoderProjection and populatePreparedDecoderProjection into a single normalizeDecoderProjection method (no behavioral change, 2.6% WER maintained)

Full Changelog: v0.13.2.5...v0.13.2.6

Assets 2

28 Mar 03:40

Alex-Wengg

v0.13.2.5

9b49377

v0.13.2.5

What's Changed

Directory Structure Refactoring

Reorganized ASR directory by model family (Parakeet/, Qwen3/)
Split Streaming/ into EOU/ and Nemotron/ subdirectories
Removed Parakeet/Shared/ subdirectory, moved files to Parakeet/ root
Added StreamingAsrEngine protocol for unified streaming interface

Bug Fixes

Fixed EOU shape mismatch for short audio by padding mel to expected frame count (#444)
Fixed EOU chunk sample counts to match computeFlat frame formula
Fixed race condition in chunk size initialization
Fixed Kokoro v2 source_noise dtype and distribution (#447)

Dependency Updates

Updated swift-transformers from 1.2.0 to 1.3.0 (reduced dependencies from 28 to 11) (#439)

Removed

Removed unsupported Nemotron 80ms/160ms streaming variants
Marked KittenTTS and Qwen3-TTS as not supported (#437)

Documentation

Updated file paths in Documentation to match new ASR structure
Added MimicScribe to showcase (#446)

Full Changelog: v0.13.2...v0.13.2.5

Assets 2

26 Mar 21:40

Alex-Wengg

v0.13.2

716f1c9

v0.13.2: TDT-CTC-110M

ASR

Parakeet-TDT-CTC-110M hybrid model (#433) - Fused preprocessor+encoder, CLI: --model-version tdt-ctc-110m. Closes #383
CTC decoder with ARPA language model (#436) - Greedy/beam search decoding, 9.4% WER with domain LM. Closes #384

TTS

Fix Kokoro TTS Archive build failures (#426) - Replace Float16.bitPattern with vImage conversion. Closes #423

Full Changelog: v0.13.1...v0.13.2

Assets 2

26 Mar 19:39

Alex-Wengg

v0.13.1

88527fc

v0.13.1: Nemotron Streaming

ASR

Nemotron Speech Streaming 0.6B (#432) - Streaming ASR with vDSP optimization, 2.12% WER, 6.4x RTFx. Closes #389

Diarization

Timeline synchronization and LS-EEND finalization (#421) - Unified offline/streaming finalization, Sortformer flush behavior improvements

Full Changelog: v0.13.0...v0.13.1

Assets 2

26 Mar 19:35

Alex-Wengg

v0.13.0

bcfe5a5

v0.13.0: new models + code reorg preparation

Full Changelog: v0.12.6...v0.13.0

Assets 2

24 Mar 21:31

Alex-Wengg

v0.12.6

aa800cb

v0.12.6 - Swift 6 Concurrency Fixes

What's Changed

Swift 6 Concurrency Safety 🔒

Convert AsrManager to actor (#419) for proper Swift 6 concurrency safety
Remove nonisolated(unsafe) workarounds from StreamingAsrManager
Add proper await at all AsrManager call sites
Fixes data race warnings with Xcode 16.4 RC's stricter concurrency enforcement

Performance Improvements ⚡

Qwen3 ASR ANE Optimization (#410): Audio encoder fp16 conversion for Apple Neural Engine
Kokoro TTS ANE Optimization (#411): fp16 conversion for better Neural Engine performance

Bug Fixes 🐛

Fix KokoroTtsManager.initialize() hang on iOS (#418): Resolves initialization deadlock on iOS devices
Fix missing source_noise input in Kokoro TTS (#412): Correct model input configuration

Breaking Changes ⚠️

External calls to AsrManager methods now require await:

// Before
let manager = AsrManager()
manager.cleanup()

// After
let manager = AsrManager()
await manager.cleanup()

Testing

✅ All CI tests pass (13 tests, 0 failures)

Full Changelog: v0.12.5...v0.12.6

Assets 2

21 Mar 01:29

Alex-Wengg

v0.12.5

2d29794

v0.12.5

What's New

LS-EEND Diarizer (#376): End-to-end streaming diarization with up to 10 speakers
- Five variants (AMI, CALLHOME, DIHARD2/3, VoxConverse) optimized for different scenarios
- 100ms frame updates with 900ms tentative preview
- Unified DiarizerTimeline API shared with Sortformer
- CLI commands: lseend and lseend-benchmark
- Full documentation in Documentation/Diarization/LSEEND.md
Parakeet EOU 1280ms (#388): Added support for 1280ms streaming chunk size

API Improvements

DiarizerTimeline (#402): Make speakers publicly mutable for custom speaker management
TDT Decoder (#382): Populate tokenDurations for accurate word endTime

Fixes

G2P Multilingual (#400): Fix multilingual path resolution
EmbeddingExtractor (#398): Clamp numMasksInChunk to prevent heap-buffer-overflow
Swift Transformers (#378): Bump minimum to 1.2.0 (trailing comma fix)

Docs

LS-EEND vs Sortformer (#397): Add enrollment feedback from integration testing
Model Conversion Guide (#391, #392): Add guide with existing benchmark datasets
PocketTTS Architecture (#380): Add pipeline architecture comments
Showcase Updates: Add Audite (#396), Hitoku Draft (#385), update to OpenOats (#399)
Hugging Face Badge: Update to 800k+ downloads

Assets 2

Releases: FluidInference/FluidAudio

v0.13.6

What's New in v0.13.6

Features

Improvements

Documentation

Uh oh!

v0.13.5

What's New in v0.13.5

Features

Improvements

Bug Fixes

Documentation

Uh oh!

v0.13.4

Changes since v0.13.3

Uh oh!

v0.13.3

What's Changed

Documentation

Refactoring

Uh oh!

v0.13.2.5

What's Changed

Directory Structure Refactoring

Bug Fixes

Dependency Updates

Removed

Documentation

Uh oh!

v0.13.2: TDT-CTC-110M

ASR

TTS

Uh oh!

v0.13.1: Nemotron Streaming

ASR

Diarization

Uh oh!

v0.13.0: new models + code reorg preparation

Uh oh!

v0.12.6 - Swift 6 Concurrency Fixes

What's Changed

Swift 6 Concurrency Safety 🔒

Performance Improvements ⚡

Bug Fixes 🐛

Breaking Changes ⚠️

Testing

Uh oh!

v0.12.5

What's New

API Improvements

Fixes

Docs

Uh oh!