CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

FluidAudio is a Swift framework for local, low-latency audio processing on Apple platforms. It provides speaker diarization, automatic speech recognition (ASR), and voice activity detection (VAD) through open-source models converted to Core ML.

Critical Development Rules

NEVER USE `@unchecked Sendable`

Always implement thread-safe code with proper synchronization
Use actors, @MainActor, or proper locking mechanisms instead
If you encounter Sendable conformance issues, fix them properly

NEVER CREATE DUMMY MODELS OR SYNTHETIC DATA

Do not create dummy, mock, or fake models for testing or development
Do not generate synthetic audio data for testing
Always use the actual models required by the code
If model authentication is required, inform the user rather than creating dummy versions

NEVER UPLOAD TO HUGGINGFACE

Do not upload models, datasets, or any files to HuggingFace
Do not create HuggingFace repos
Prepare files locally and let the user handle all HF uploads themselves

MODEL OPERATIONS - CONSULT BEFORE IMPLEMENTING

When asked to merge, convert, or modify models:
- If it seems impossible or there are significant objections, consult the user first
- If they say proceed, do it immediately without further objections
Do not create placeholder models or implement alternatives without asking

User Preferences

Never start responses with positive re-affirming text ("You're absolutely right!", "Good change!", etc.)
Get straight to the point with technical facts
For debugging, use print statements and delete them at the end when instructed
Never create fallbacks or simplified solutions that don't actually solve the problem
When asked to implement something specific, do it first before explaining why it might not be optimal
Don't over-do things that aren't asked

Development Guidelines

Follow Instructions: Implementation first, explanation second
Testing Policy: Add unit tests when writing new code.
Git Operations: Never run git push unless explicitly requested.
- No Co-Author Tags: Do not add Co-Authored-By lines for Claude, Copilot, or any AI assistant in commit messages.
- No GitHub comments: Never post comments, reviews, or reactions on issues or PRs via gh. Reading issues, PRs, and comments is fine. Creating PRs and editing PR titles/bodies is fine.
Code Formatting: All code must pass swift-format checks before merge
Avoid Deprecated Code: Do not add support for deprecated models or features unless explicitly requested
Performance: Keep RTFx > 1.0x for real-time capability

Code Style

Swift Format: Enforced via .swift-format config, CI checked
Local formatting: swift format --in-place --recursive --configuration .swift-format Sources/ Tests/
Line length: 120 characters
Indentation: 4 spaces
Import order: Alphabetical preferred, but OrderedImports rule is disabled due to Swift 6.1 (GitHub Actions CI) vs 6.3 (local) formatter incompatibility. Swift 6.3 is unavailable in GitHub Actions runners.
Naming: lowerCamelCase for variables/functions, UpperCamelCase for types
Error handling: Proper Swift error handling, no force unwrapping in production. Per-module error enums conforming to Error, LocalizedError (e.g. ASRError, VadError, OfflineDiarizationError, Qwen3AsrError)
Logging: Use AppLogger(category:) from Shared/AppLogger.swift — not print() in production code. One logger per component (e.g. AppLogger(category: "VadManager"))
Documentation: Triple-slash comments (///) for public APIs
Control flow: Prefer guard statements and early returns over nested if statements

Build Commands

# Build
swift build                             # Debug build
swift build -c release                 # Release build (recommended for benchmarks)

# Test
swift test                             # Run all tests
swift test --filter CITests           # Run CI-specific tests only
swift test --filter AsrManagerTests   # Run specific test class

# Format
swift format --in-place --recursive --configuration .swift-format Sources/ Tests/
swift format lint --recursive --configuration .swift-format Sources/ Tests/

# Package management
swift package update
swift package resolve
swift package clean

CLI Commands

# Transcription
swift run fluidaudiocli transcribe audio.wav
swift run fluidaudiocli transcribe audio.wav --low-latency
swift run fluidaudiocli qwen3-transcribe audio.wav
swift run fluidaudiocli multi-stream audio1.wav audio2.wav

# TTS
swift run fluidaudiocli tts "Hello world" --output hello.wav

# Diarization
swift run fluidaudiocli process meeting.wav --output results.json --threshold 0.6
swift run fluidaudiocli sortformer audio.wav
swift run fluidaudiocli parakeet-eou --input audio.wav

# Benchmarks
swift run fluidaudiocli asr-benchmark --subset test-clean --max-files 100
swift run fluidaudiocli diarization-benchmark --auto-download
swift run fluidaudiocli vad-benchmark --num-files 40 --threshold 0.5
swift run fluidaudiocli fleurs-benchmark --languages en_us,fr_fr --samples 10
swift run fluidaudiocli sortformer-benchmark
swift run fluidaudiocli qwen3-benchmark
swift run fluidaudiocli ctc-earnings-benchmark
swift run fluidaudiocli g2p-benchmark

# Dataset downloads
swift run fluidaudiocli download --dataset ami-sdm
swift run fluidaudiocli download --dataset librispeech-test-clean

Project Structure

FluidAudio/
├── Sources/
│   ├── FluidAudio/           # Main library (single product)
│   │   ├── ASR/             # Automatic Speech Recognition
│   │   │   ├── Parakeet/    # Parakeet TDT (Decoder/, SlidingWindow/, Streaming/)
│   │   │   └── Qwen3/       # Qwen3 ASR
│   │   ├── Diarizer/        # Speaker diarization (segmentation, embedding, clustering)
│   │   ├── TTS/             # Text-to-speech (Kokoro, PocketTTS)
│   │   ├── VAD/             # Voice Activity Detection (Silero VAD)
│   │   └── Shared/          # Common utilities (audio conversion, model downloading)
│   └── FluidAudioCLI/       # Command-line interface (macOS only)
├── Tests/                   # Test suite
├── Scripts/                 # Python utilities (benchmarks, evaluation tools)
├── mobius/                  # Research submodule: model conversions, trials, and known issues
├── Documentation/           # Reference documentation
├── Frameworks/              # Vendored frameworks
└── ThirdPartyLicenses/      # Third-party license files

Architecture Overview

Core Components

AsrManager (ASR/Parakeet/): Speech-to-text via TDT (Token Duration Transducer) decoding. Stateless per-chunk processing with automatic decoder state reset.
SlidingWindowAsrManager (ASR/Parakeet/SlidingWindow/): Real-time ASR with sliding window processing and cancellation support.
StreamingAsrEngine (ASR/Parakeet/Streaming/): Protocol for true streaming ASR engines (EOU, Nemotron) with cache-aware encoders.
Qwen3AsrManager (ASR/Qwen3/): Qwen3-based ASR with Whisper mel spectrogram frontend.
OfflineDiarizerManager (Diarizer/): Speaker separation via segmentation, embedding extraction, and VBx clustering. 17.7% DER on AMI dataset.
VadManager (VAD/): Voice activity detection with CoreML models.
KokoroSynthesizer (TTS/Kokoro/): Kokoro text-to-speech synthesis.
PocketTtsSynthesizer (TTS/PocketTTS/): PocketTTS streaming text-to-speech synthesis.

Key Patterns

Actor-based concurrency: Thread-safe processing, no @unchecked Sendable
Stateless ASR: Each chunk transcribed independently (~14.96s chunks, 2.0s overlap)
Auto-recovery: Corrupt CoreML model detection and re-download from HuggingFace
Model management: Models auto-download from HuggingFace on first use. Can be pre-fetched via swift run fluidaudiocli download.
Cross-platform: macOS 14.0+, iOS 17.0+ (library), CLI macOS-only

Platform Requirements

Swift: 5.10+ (Swift 6+ for swift-format)
C++17: Required for FastClusterWrapper (set via cxxLanguageStandard: .cxx17 in Package.swift)
Platforms: macOS 14.0+, iOS 17.0+
Hardware: Apple Silicon recommended

CI/CD

GitHub Actions workflows:

swift-format.yml: Code formatting compliance
tests.yml: Build and test execution
asr-benchmark.yml: ASR performance validation
diarizer-benchmark.yml: Diarization benchmarks
vad-benchmark.yml: VAD validation

Model Sources

Diarization: pyannote/speaker-diarization-3.1
VAD CoreML: FluidInference/silero-vad-coreml
ASR Models: FluidInference/parakeet-tdt-0.6b-v3-coreml
Test Data: alexwengg/musan_mini* variants

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Critical Development Rules

NEVER USE `@unchecked Sendable`

NEVER CREATE DUMMY MODELS OR SYNTHETIC DATA

NEVER UPLOAD TO HUGGINGFACE

MODEL OPERATIONS - CONSULT BEFORE IMPLEMENTING

User Preferences

Development Guidelines

Code Style

Build Commands

CLI Commands

Project Structure

Architecture Overview

Core Components

Key Patterns

Platform Requirements

CI/CD

Model Sources

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Critical Development Rules

NEVER USE @unchecked Sendable

NEVER CREATE DUMMY MODELS OR SYNTHETIC DATA

NEVER UPLOAD TO HUGGINGFACE

MODEL OPERATIONS - CONSULT BEFORE IMPLEMENTING

User Preferences

Development Guidelines

Code Style

Build Commands

CLI Commands

Project Structure

Architecture Overview

Core Components

Key Patterns

Platform Requirements

CI/CD

Model Sources

NEVER USE `@unchecked Sendable`