Skip to content

Releases: FluidInference/FluidAudio

v0.13.6

04 Apr 17:48
57551cd

Choose a tag to compare

What's New in v0.13.6

Features

  • Add Japanese ASR support with JSUT and Common Voice datasets (#478)
  • Add opt-in embedding skip strategy for offline diarization pipeline (#480)
  • Add configurable computeUnits for Kokoro models (#482)
    • Adds workaround for iOS 26 ANE compiler regressions
    • Allows bypassing ANE with .cpuAndGPU when needed
    • Backwards compatible (default remains .all)

Improvements

  • Skip error recovery on intentional cancellation (#481)

Documentation

  • Add Parakeet EOU ultra-low latency demo video to README

Full Changelog: v0.13.5...v0.13.6

v0.13.5

03 Apr 03:25
6c40eca

Choose a tag to compare

What's New in v0.13.5

Features

  • Add experimental CTC zh-CN Mandarin ASR (8.23% CER on THCHS-30) (#476)
  • Add PocketTTS sessions with persistent KV-cache (#471)
  • Add PunctuationCommitLayer for punctuation-aware streaming ASR (#466)

Improvements

  • Refactor TDT decoder: Extract reusable components (#474)
  • ASR architecture cleanup: naming, dead code, file organization (#468)
  • Clarify custom vocabulary model compatibility and approach selection (#469)

Bug Fixes

  • Fix Swift 6 concurrency errors in SlidingWindowAsrManager (#472, #476)
  • Fix use-after-free when mic and system transcription run concurrently (#473)
  • Fix fatal error in levenshteinDistance with empty arrays (#476)

Documentation

  • Fix stale references in ASR documentation (#462)
  • Update Documentation index, remove espeak-ng licenses (#461)
  • Clean up CI workflows (#463, #464)

Full Changelog: v0.13.4...v0.13.5

Note: CTC zh-CN is experimental. API may change in future releases.

v0.13.4

29 Mar 03:45
d9eef86

Choose a tag to compare

Changes since v0.13.3

  • Add standalone CTC head for custom vocabulary (#435, #450)
  • Make parakeetTdtCtc110m folderName consistent with other Parakeet models (#453)
  • Replace swift-transformers with minimal BPE tokenizer (#449)
  • Add RTFx tracking and validation to all benchmark workflows (#458)

v0.13.3

28 Mar 06:02
f3dba78

Choose a tag to compare

What's Changed

Documentation

  • Added comprehensive ASR directory structure documentation explaining old vs new layout, SlidingWindow vs Streaming distinction, and design decisions
  • Added main branch baseline benchmarks for PR #440 regression testing (6 models tested on M2 16GB)
  • Added PR branch benchmark results showing no regressions across all 6 models:
    • TDT v3: 2.6% WER (maintained)
    • TDT v2: 3.8% WER (maintained)
    • CTC-TDT 110M: 3.6% WER (maintained)
    • EOU 320ms: 7.11% WER (maintained)
    • Nemotron 1120ms: 1.99% WER (maintained)
    • CTC Earnings: 16.51% WER (within noise of 16.54%)

Refactoring

  • Deduplicated decoder projection normalization in TdtDecoderV3 by merging prepareDecoderProjection and populatePreparedDecoderProjection into a single normalizeDecoderProjection method (no behavioral change, 2.6% WER maintained)

Full Changelog: v0.13.2.5...v0.13.2.6

v0.13.2.5

28 Mar 03:40

Choose a tag to compare

What's Changed

Directory Structure Refactoring

  • Reorganized ASR directory by model family (Parakeet/, Qwen3/)
  • Split Streaming/ into EOU/ and Nemotron/ subdirectories
  • Removed Parakeet/Shared/ subdirectory, moved files to Parakeet/ root
  • Added StreamingAsrEngine protocol for unified streaming interface

Bug Fixes

  • Fixed EOU shape mismatch for short audio by padding mel to expected frame count (#444)
  • Fixed EOU chunk sample counts to match computeFlat frame formula
  • Fixed race condition in chunk size initialization
  • Fixed Kokoro v2 source_noise dtype and distribution (#447)

Dependency Updates

  • Updated swift-transformers from 1.2.0 to 1.3.0 (reduced dependencies from 28 to 11) (#439)

Removed

  • Removed unsupported Nemotron 80ms/160ms streaming variants
  • Marked KittenTTS and Qwen3-TTS as not supported (#437)

Documentation

  • Updated file paths in Documentation to match new ASR structure
  • Added MimicScribe to showcase (#446)

Full Changelog: v0.13.2...v0.13.2.5

v0.13.2: TDT-CTC-110M

26 Mar 21:40
716f1c9

Choose a tag to compare

ASR

  • Parakeet-TDT-CTC-110M hybrid model (#433) - Fused preprocessor+encoder, CLI: --model-version tdt-ctc-110m. Closes #383
  • CTC decoder with ARPA language model (#436) - Greedy/beam search decoding, 9.4% WER with domain LM. Closes #384

TTS

  • Fix Kokoro TTS Archive build failures (#426) - Replace Float16.bitPattern with vImage conversion. Closes #423

Full Changelog: v0.13.1...v0.13.2

v0.13.1: Nemotron Streaming

26 Mar 19:39
88527fc

Choose a tag to compare

ASR

  • Nemotron Speech Streaming 0.6B (#432) - Streaming ASR with vDSP optimization, 2.12% WER, 6.4x RTFx. Closes #389

Diarization

  • Timeline synchronization and LS-EEND finalization (#421) - Unified offline/streaming finalization, Sortformer flush behavior improvements

Full Changelog: v0.13.0...v0.13.1

v0.13.0: new models + code reorg preparation

26 Mar 19:35
bcfe5a5

Choose a tag to compare

v0.12.6 - Swift 6 Concurrency Fixes

24 Mar 21:31
aa800cb

Choose a tag to compare

What's Changed

Swift 6 Concurrency Safety 🔒

  • Convert AsrManager to actor (#419) for proper Swift 6 concurrency safety
  • Remove nonisolated(unsafe) workarounds from StreamingAsrManager
  • Add proper await at all AsrManager call sites
  • Fixes data race warnings with Xcode 16.4 RC's stricter concurrency enforcement

Performance Improvements ⚡

  • Qwen3 ASR ANE Optimization (#410): Audio encoder fp16 conversion for Apple Neural Engine
  • Kokoro TTS ANE Optimization (#411): fp16 conversion for better Neural Engine performance

Bug Fixes 🐛

  • Fix KokoroTtsManager.initialize() hang on iOS (#418): Resolves initialization deadlock on iOS devices
  • Fix missing source_noise input in Kokoro TTS (#412): Correct model input configuration

Breaking Changes ⚠️

External calls to AsrManager methods now require await:

// Before
let manager = AsrManager()
manager.cleanup()

// After
let manager = AsrManager()
await manager.cleanup()

Testing

✅ All CI tests pass (13 tests, 0 failures)

Full Changelog: v0.12.5...v0.12.6

v0.12.5

21 Mar 01:29
2d29794

Choose a tag to compare

What's New

  • LS-EEND Diarizer (#376): End-to-end streaming diarization with up to 10 speakers

    • Five variants (AMI, CALLHOME, DIHARD2/3, VoxConverse) optimized for different scenarios
    • 100ms frame updates with 900ms tentative preview
    • Unified DiarizerTimeline API shared with Sortformer
    • CLI commands: lseend and lseend-benchmark
    • Full documentation in Documentation/Diarization/LSEEND.md
  • Parakeet EOU 1280ms (#388): Added support for 1280ms streaming chunk size

API Improvements

  • DiarizerTimeline (#402): Make speakers publicly mutable for custom speaker management
  • TDT Decoder (#382): Populate tokenDurations for accurate word endTime

Fixes

  • G2P Multilingual (#400): Fix multilingual path resolution
  • EmbeddingExtractor (#398): Clamp numMasksInChunk to prevent heap-buffer-overflow
  • Swift Transformers (#378): Bump minimum to 1.2.0 (trailing comma fix)

Docs

  • LS-EEND vs Sortformer (#397): Add enrollment feedback from integration testing
  • Model Conversion Guide (#391, #392): Add guide with existing benchmark datasets
  • PocketTTS Architecture (#380): Add pipeline architecture comments
  • Showcase Updates: Add Audite (#396), Hitoku Draft (#385), update to OpenOats (#399)
  • Hugging Face Badge: Update to 800k+ downloads