You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MicRestartPolicy.swift # Pure decision logic for mic engine restart on device change
66
+
SampleRateQuery.swift # Pure functions for sample rate detection and cross-validation
67
+
Tests/
68
+
MicRestartPolicyTests.swift
69
+
SampleRateQueryTests.swift
65
70
tools/meeting-simulator/ # Meeting simulator tool for testing
66
71
Package.swift
67
72
Sources/main.swift
@@ -170,7 +175,7 @@ Use the `/git-workflow` skill. Commit proactively after every logical unit of wo
170
175
**Transcription engines:**
171
176
-`TranscribingEngine` protocol abstracts ASR backends. Three implementations: `WhisperKitEngine` (99+ languages, ~1 GB model), `ParakeetEngine` (25 EU languages, ~50 MB model, ~10× faster), and `Qwen3AsrEngine` (30 languages, ~1.75 GB model, macOS 15+).
172
177
-`AppSettings.transcriptionEngine` enum (`.whisperKit` / `.parakeet` / `.qwen3`) selects the engine. Settings UI shows engine picker; engine-specific options hidden when not selected. `availableCases` filters by macOS version.
173
-
- Parakeet auto-detects language (no parameter). WhisperKit and Qwen3 support explicit language selection.
178
+
- Parakeet auto-detects language (no parameter) and supports custom vocabulary via CTC boosting (`ParakeetEngine.customVocabularyPath`). WhisperKit and Qwen3 support explicit language selection.
174
179
-`Qwen3AsrEngine` requires macOS 15+ (`@available`). Returns plain text (no timestamps) — emits single `TimestampedSegment`. Chunks audio into <=30s windows (`Qwen3AsrConfig.maxAudioSeconds`). Type-erased in AppState via `_qwen3Engine: AnyObject?` for macOS <15 compatibility.
175
180
-`AppState.activeTranscriptionEngine` returns the selected engine, used by `PipelineQueue`.
176
181
@@ -199,11 +204,15 @@ Use the `/git-workflow` skill. Commit proactively after every logical unit of wo
199
204
-`MeetingDetector` counts each pattern once per poll — prevents over-counting when multiple windows match the same app.
200
205
201
206
**Diarization:**
202
-
-`FluidDiarizer` uses FluidAudio (CoreML/ANE) for on-device speaker diarization — no HuggingFace token needed.
207
+
-`FluidDiarizer` uses FluidAudio (CoreML/ANE) for on-device speaker diarization — no HuggingFace token needed. Two modes: `.offlineDiarizer` (default) and `.sortformer` (overlap-aware, via `SortformerDiarizer`). Selected via `AppSettings.diarizerMode`.
203
208
-**Dual-track diarization:** App and mic tracks are diarized separately. Speaker IDs are prefixed (`R_` for remote/app, `M_` for mic/local), merged, and assigned via `assignSpeakersDualTrack`. Single-source recordings fall back to diarizing the mix with `assignSpeakers`.
204
209
-`SpeakerMatcher` stores speaker embeddings in `speakers.json` and matches via cosine similarity (multi-embedding, max 5 per speaker, confidence margin 0.10).
205
210
-`DiarizationProvider` protocol enables mock injection in tests.
206
211
212
+
**VAD preprocessing:**
213
+
-`FluidVAD` wraps FluidAudio Silero v6 for voice activity detection. When enabled (`AppSettings.vadEnabled`), silence is trimmed before transcription and timestamps are remapped back to the original timeline via `VadSegmentMap`.
214
+
-`PipelineQueue` holds a cached `FluidVAD` instance (reused across jobs). Pass `vadConfig: nil` to disable.
215
+
207
216
**Protocol generation:**
208
217
-`ProtocolGenerating` protocol with two implementations: `ClaudeCLIProtocolGenerator` and `OpenAIProtocolGenerator`.
209
218
-`AppSettings.protocolProvider` enum (`.claudeCLI` / `.openAICompatible` / `.none`) selects the provider. `.none` skips LLM generation and saves the transcript only.
@@ -233,7 +242,7 @@ Two build variants controlled by compile-time flag `APPSTORE` (`-Xswiftc -DAPPST
233
242
|**OpenAI API**| Yes | Yes (only LLM option) |
234
243
|**Entitlements**| Mic only | Sandbox + mic + network + file picker |
-**On-device speaker diarization** — [FluidAudio](https://github.com/FluidInference/FluidAudio) via CoreML/ANE — no HuggingFace token needed
60
+
-**On-device speaker diarization** — [FluidAudio](https://github.com/FluidInference/FluidAudio) via CoreML/ANE — no HuggingFace token needed; two modes: standard (`OfflineDiarizer`) and overlap-aware (`Sortformer`)
61
61
-**Dual-track diarization** — App and mic tracks diarized separately for clean speaker separation without echo interference
62
62
-**Speaker recognition** — Voice embeddings stored across meetings, matched via cosine similarity
63
-
-**AI protocol generation** — Structured Markdown via [Claude Code CLI](https://docs.anthropic.com/en/docs/claude-code) or OpenAI-compatible APIs (Ollama, LM Studio, etc.)
63
+
-**VAD preprocessing** — Optional silence trimming via FluidAudio Silero v6 before transcription, with automatic timestamp remapping
└─ Metadata: micDelay, actualSampleRate via AudioCaptureResult
108
+
└─ Metadata: micDelay, actualSampleRate, actualChannels via AudioCaptureResult
108
109
```
109
110
110
111
**Key:** CATapDescription requires NO Screen Recording permission (purple dot indicator only). Handles output device changes by recreating tap automatically.
111
112
112
113
### Processing (DualSourceRecorder.stop())
113
114
114
115
```
115
-
Raw float32 stereo → mono (channel average)
116
+
Raw float32 (mono or stereo, actual channel count from AudioCaptureResult) → mono
116
117
→ Resample to 16kHz
117
118
→ Save app.wav (16kHz mono)
118
119
→ Load mic.wav (already 16kHz from MicCaptureHandler)
@@ -185,7 +186,11 @@ All recordings are normalized to 16kHz at capture time — no resampling needed
185
186
186
187
On-device speaker diarization using FluidAudio (CoreML/ANE). No HuggingFace token or Python subprocess needed. Models downloaded automatically on first run (~50 MB).
187
188
188
-
Flow: `FluidDiarizer.run(audioPath, numSpeakers)` → `OfflineDiarizerManager` → `DiarizationResult` with segments, speaking times, and speaker embeddings.
189
+
Two modes selected via `AppSettings.diarizerMode`:
190
+
-**`.offlineDiarizer`** (default) — `OfflineDiarizerManager`, standard speaker segmentation
0 commit comments