|
| 1 | +# Parakeet Swift Integration - Implementation Summary |
| 2 | + |
| 3 | +## 🎯 Overview |
| 4 | + |
| 5 | +Successfully replaced the 123MB Python/MLX Parakeet sidecar with a 1.2MB Swift/FluidAudio implementation that provides native macOS transcription using Apple Neural Engine. |
| 6 | + |
| 7 | +**⚠️ Platform Support**: This integration is **macOS-only**. Windows and Linux will continue to use Whisper models exclusively. |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## ✅ What Was Implemented |
| 12 | + |
| 13 | +### 1. Swift Sidecar (`/sidecar/parakeet-swift/`) |
| 14 | + |
| 15 | +**Files Created:** |
| 16 | +- `Sources/main.swift` - Main sidecar logic with FluidAudio integration |
| 17 | +- `Package.swift` - Swift package configuration |
| 18 | +- `build.sh` - Automated build script with proper target triple naming |
| 19 | +- `README.md` - Comprehensive documentation |
| 20 | +- `.gitignore` - Git ignore rules for build artifacts |
| 21 | + |
| 22 | +**Features:** |
| 23 | +- ✅ JSON-based communication protocol (stdin/stdout) |
| 24 | +- ✅ Commands: `load_model`, `transcribe`, `delete_model`, `unload_model`, `status` |
| 25 | +- ✅ FluidAudio SDK integration (v0.5.2) |
| 26 | +- ✅ Apple Neural Engine acceleration |
| 27 | +- ✅ Proper error handling and status responses |
| 28 | +- ✅ Model caching managed by FluidAudio |
| 29 | + |
| 30 | +### 2. Rust Backend Integration |
| 31 | + |
| 32 | +**Modified Files:** |
| 33 | +- `src-tauri/src/parakeet/manager.rs` - Delegates to Swift sidecar |
| 34 | +- `src-tauri/src/parakeet/messages.rs` - Added `DeleteModel` command |
| 35 | +- `src-tauri/src/commands/reset.rs` - Clears FluidAudio cache |
| 36 | +- `src-tauri/src/commands/model.rs` - Unified model management |
| 37 | +- `src-tauri/build.rs` - Automatically builds Swift sidecar |
| 38 | +- `src-tauri/tauri.conf.json` - Updated externalBin path |
| 39 | + |
| 40 | +**Key Improvements:** |
| 41 | +- ✅ Proper model availability checking via FluidAudio cache |
| 42 | +- ✅ Health check function for sidecar verification |
| 43 | +- ✅ Download/delete operations delegate to Swift |
| 44 | +- ✅ Reset App Data clears all FluidAudio cached files |
| 45 | + |
| 46 | +### 3. Build System |
| 47 | + |
| 48 | +**Automated Build Process:** |
| 49 | +1. `pnpm tauri dev` or `pnpm tauri build` |
| 50 | +2. → Triggers `src-tauri/build.rs` |
| 51 | +3. → Runs `sidecar/parakeet-swift/build.sh` |
| 52 | +4. → Produces `dist/parakeet-sidecar-aarch64-apple-darwin` |
| 53 | +5. → Tauri bundles it automatically |
| 54 | + |
| 55 | +**Target Triple Handling:** |
| 56 | +- macOS ARM64: `aarch64-apple-darwin` |
| 57 | +- macOS Intel: `x86_64-apple-darwin` |
| 58 | +- Future: Linux/Windows targets configurable |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## 📊 Benefits |
| 63 | + |
| 64 | +| Metric | Old (Python/MLX) | New (Swift/FluidAudio) | Improvement | |
| 65 | +|--------|------------------|------------------------|-------------| |
| 66 | +| Binary Size | 123 MB | 1.2 MB | **99% smaller** | |
| 67 | +| Download Size | 123 MB + 500 MB models | 1.2 MB + 500 MB models | Same models, tiny binary | |
| 68 | +| Performance | MLX (CPU/GPU) | Apple Neural Engine | **Native acceleration** | |
| 69 | +| User Control | Auto-download | User clicks Download | **Better UX** | |
| 70 | +| macOS Integration | Python runtime | Native Swift | **Fully native** | |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## 🔄 Data Flow |
| 75 | + |
| 76 | +### Download Flow |
| 77 | +``` |
| 78 | +1. User clicks "Download" in Settings |
| 79 | +2. Frontend → Rust: download_model(model_name) |
| 80 | +3. Rust → Swift: {"type": "load_model", "model_id": "..."} |
| 81 | +4. Swift → FluidAudio: AsrModels.downloadAndLoad() |
| 82 | +5. FluidAudio downloads CoreML to ~/Library/Application Support/ |
| 83 | +6. Swift → Rust: {"type": "status", "loaded_model": "..."} |
| 84 | +7. Rust → Frontend: model-downloaded event |
| 85 | +``` |
| 86 | + |
| 87 | +### Transcription Flow |
| 88 | +``` |
| 89 | +1. User records audio |
| 90 | +2. Frontend → Rust: transcribe(audio_path) |
| 91 | +3. Rust → Swift: {"type": "transcribe", "audio_path": "..."} |
| 92 | +4. Swift → FluidAudio: asrManager.transcribe(fileURL) |
| 93 | +5. FluidAudio uses Apple Neural Engine |
| 94 | +6. Swift → Rust: {"type": "transcription", "text": "..."} |
| 95 | +7. Rust → Frontend: Insert text at cursor |
| 96 | +``` |
| 97 | + |
| 98 | +### Delete Flow |
| 99 | +``` |
| 100 | +1. User clicks "Remove" in Settings |
| 101 | +2. Frontend → Rust: delete_model(model_name) |
| 102 | +3. Rust → Swift: {"type": "delete_model"} |
| 103 | +4. Swift deletes: |
| 104 | + - ~/Library/Application Support/FluidAudio/ |
| 105 | + - ~/Library/Application Support/parakeet-tdt-0.6b-v3-coreml/ |
| 106 | + - ~/Library/Caches/FluidAudio/ |
| 107 | +5. Swift → Rust: {"type": "status", "loaded_model": null} |
| 108 | +6. Rust → Frontend: model-deleted event |
| 109 | +``` |
| 110 | + |
| 111 | +### Reset App Data Flow |
| 112 | +``` |
| 113 | +1. User clicks "Reset App Data" |
| 114 | +2. Frontend → Rust: reset_app_data() |
| 115 | +3. Rust clears: |
| 116 | + - FluidAudio cache directories |
| 117 | + - Old Parakeet tracking dirs |
| 118 | + - Tauri stores (settings, transcriptions) |
| 119 | + - Secure store (API keys) |
| 120 | + - System preferences |
| 121 | +4. Rust → Frontend: reset-complete event |
| 122 | +``` |
| 123 | + |
| 124 | +--- |
| 125 | + |
| 126 | +## 🧪 Testing Checklist |
| 127 | + |
| 128 | +### Manual Testing Required |
| 129 | + |
| 130 | +- [ ] **Build Test**: `pnpm tauri dev` compiles Swift sidecar |
| 131 | +- [ ] **Health Check**: App starts without sidecar errors |
| 132 | +- [ ] **Download**: Click Download, verify ~500MB CoreML downloads |
| 133 | +- [ ] **Status Check**: Downloaded model shows as available |
| 134 | +- [ ] **Transcription**: Record audio, verify transcription works |
| 135 | +- [ ] **Quality**: Check transcription accuracy vs Whisper |
| 136 | +- [ ] **Delete**: Click Remove, verify files deleted from disk |
| 137 | +- [ ] **Re-download**: Download again after delete |
| 138 | +- [ ] **Reset App Data**: Verify all Parakeet data cleared |
| 139 | +- [ ] **Persistence**: Model selection survives app restart |
| 140 | + |
| 141 | +### Automated Tests Needed (Future) |
| 142 | + |
| 143 | +```rust |
| 144 | +// Integration test idea |
| 145 | +#[tokio::test] |
| 146 | +async fn test_parakeet_sidecar_communication() { |
| 147 | + let app = test_app(); |
| 148 | + let manager = ParakeetManager::new(temp_dir()); |
| 149 | + |
| 150 | + // Health check |
| 151 | + assert!(manager.health_check(&app).await.is_ok()); |
| 152 | + |
| 153 | + // Status check |
| 154 | + let response = manager.client.send(&app, &ParakeetCommand::Status {}).await.unwrap(); |
| 155 | + assert!(matches!(response, ParakeetResponse::Status { .. })); |
| 156 | +} |
| 157 | +``` |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +## 🚨 Known Limitations |
| 162 | + |
| 163 | +### Platform Limitations |
| 164 | + |
| 165 | +1. **macOS Only**: Swift/FluidAudio is macOS-exclusive (by design) |
| 166 | + - **Backend**: Returns empty Parakeet model list on Windows/Linux |
| 167 | + - **Frontend**: Dynamically detects engine from selected model |
| 168 | + - Windows/Linux: Only Whisper models appear in UI |
| 169 | + - Future: May add Windows-specific native models if available |
| 170 | + |
| 171 | +2. **Model Availability Heuristic**: |
| 172 | + - Currently checks if FluidAudio cache directories exist |
| 173 | + - Not 100% accurate if user manually deletes files |
| 174 | + - **Improvement**: Query sidecar status on app startup |
| 175 | + |
| 176 | +3. **No Progress for Model Download**: |
| 177 | + - FluidAudio doesn't expose download progress |
| 178 | + - UI shows indeterminate spinner |
| 179 | + - User must wait ~2-5 minutes for 500MB download |
| 180 | + |
| 181 | +4. **Single Model Support**: |
| 182 | + - Only Parakeet TDT 0.6B v3 currently available |
| 183 | + - FluidAudio may support more models in future |
| 184 | + |
| 185 | +### Future Improvements |
| 186 | + |
| 187 | +- [ ] Expose FluidAudio download progress (if SDK adds support) |
| 188 | +- [ ] Add proper model availability query on startup |
| 189 | +- [ ] Support multiple Parakeet model variants |
| 190 | +- [ ] Add offline mode detection (warn if no internet for download) |
| 191 | +- [ ] Implement model update mechanism |
| 192 | + |
| 193 | +--- |
| 194 | + |
| 195 | +## 📝 Files Modified |
| 196 | + |
| 197 | +### New Files |
| 198 | +``` |
| 199 | +sidecar/parakeet-swift/Sources/main.swift |
| 200 | +sidecar/parakeet-swift/Package.swift |
| 201 | +sidecar/parakeet-swift/build.sh |
| 202 | +sidecar/parakeet-swift/README.md |
| 203 | +sidecar/parakeet-swift/.gitignore |
| 204 | +PARAKEET_SWIFT_INTEGRATION.md (this file) |
| 205 | +``` |
| 206 | + |
| 207 | +### Modified Files |
| 208 | +``` |
| 209 | +src-tauri/build.rs |
| 210 | +src-tauri/tauri.conf.json |
| 211 | +src-tauri/src/parakeet/manager.rs (macOS-only logic added) |
| 212 | +src-tauri/src/parakeet/models.rs (removed V2, macOS-only) |
| 213 | +src-tauri/src/parakeet/messages.rs |
| 214 | +src-tauri/src/commands/reset.rs |
| 215 | +src-tauri/src/commands/model.rs |
| 216 | +src/components/onboarding/OnboardingDesktop.tsx (dynamic engine detection) |
| 217 | +``` |
| 218 | + |
| 219 | +### Unchanged (Already Configured) |
| 220 | +``` |
| 221 | +src-tauri/src/parakeet/sidecar.rs (communication logic) |
| 222 | +src-tauri/capabilities/macos.json (sidecar permissions) |
| 223 | +src-tauri/capabilities/default.json (sidecar permissions) |
| 224 | +``` |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## 🎓 Lessons Learned |
| 229 | + |
| 230 | +### Tauri v2 Sidecar Best Practices |
| 231 | + |
| 232 | +1. **Binary Naming**: Must follow `binary-name-$TARGET_TRIPLE` format |
| 233 | + - Example: `parakeet-sidecar-aarch64-apple-darwin` |
| 234 | + - Tauri automatically appends target triple when spawning |
| 235 | + |
| 236 | +2. **externalBin Path**: Points to base name WITHOUT target triple |
| 237 | + - ✅ Correct: `"../sidecar/parakeet-swift/dist/parakeet-sidecar"` |
| 238 | + - ❌ Wrong: `"../sidecar/parakeet-swift/dist/parakeet-sidecar-aarch64-apple-darwin"` |
| 239 | + |
| 240 | +3. **Build Integration**: Use `build.rs` for automated compilation |
| 241 | + - Runs before Tauri build |
| 242 | + - Gracefully handles build failures |
| 243 | + - Supports incremental builds |
| 244 | + |
| 245 | +4. **Permissions**: Configure in `capabilities/*.json` |
| 246 | + - `shell:allow-spawn` for launching sidecar |
| 247 | + - `shell:allow-stdin-write` for sending commands |
| 248 | + |
| 249 | +5. **Communication**: JSON over stdin/stdout is reliable |
| 250 | + - Use line-delimited JSON |
| 251 | + - Always flush stdout after writing |
| 252 | + - Handle stderr for debugging |
| 253 | + |
| 254 | +### Swift/FluidAudio Specifics |
| 255 | + |
| 256 | +1. **Package Management**: Swift Package Manager is straightforward |
| 257 | + - Dependencies resolve automatically |
| 258 | + - Release builds are optimized and small |
| 259 | + |
| 260 | +2. **FluidAudio SDK**: v0.5.2 is stable |
| 261 | + - Requires macOS 13.0+ |
| 262 | + - Handles model caching automatically |
| 263 | + - Returns simple `ASRResult` struct |
| 264 | + |
| 265 | +3. **JSON Serialization**: Swift Codable is powerful |
| 266 | + - Use `CodingKeys` enum for snake_case conversion |
| 267 | + - Default values in structs don't decode (use initializers) |
| 268 | + |
| 269 | +--- |
| 270 | + |
| 271 | +## 🚀 Next Steps |
| 272 | + |
| 273 | +### Immediate (Before Release) |
| 274 | + |
| 275 | +1. **Test End-to-End Flow** |
| 276 | + ```bash |
| 277 | + pnpm tauri dev |
| 278 | + # → Test: Download → Transcribe → Remove → Reset |
| 279 | + ``` |
| 280 | + |
| 281 | +2. **Verify Build Process** |
| 282 | + ```bash |
| 283 | + pnpm tauri build |
| 284 | + # → Ensure sidecar is bundled in .app |
| 285 | + ``` |
| 286 | + |
| 287 | +3. **Check Binary Signing** (for distribution) |
| 288 | + - Swift binary must be code-signed |
| 289 | + - Include in notarization process |
| 290 | + |
| 291 | +### Future Enhancements |
| 292 | + |
| 293 | +1. **Universal Binary**: Build for both ARM64 and Intel |
| 294 | + ```bash |
| 295 | + # In build.sh, support lipo for universal binaries |
| 296 | + swift build -c release --arch arm64 --arch x86_64 |
| 297 | + ``` |
| 298 | + |
| 299 | +2. **Model Selection**: Add UI for multiple Parakeet models |
| 300 | + - Query FluidAudio for available models |
| 301 | + - Let user choose between speed/accuracy tradeoffs |
| 302 | + |
| 303 | +3. **Offline Support**: Detect network issues |
| 304 | + - Show clear error if download fails |
| 305 | + - Suggest downloading when connected |
| 306 | + |
| 307 | +4. **Performance Monitoring**: Track transcription metrics |
| 308 | + - Time to transcribe |
| 309 | + - Model load time |
| 310 | + - Memory usage |
| 311 | + |
| 312 | +--- |
| 313 | + |
| 314 | +## 📚 References |
| 315 | + |
| 316 | +- [Tauri v2 Sidecar Documentation](https://v2.tauri.app/develop/sidecar/) |
| 317 | +- [FluidAudio SDK](https://github.com/FluidInference/FluidAudio) |
| 318 | +- [Swift Package Manager Guide](https://swift.org/package-manager/) |
| 319 | +- [Apple Neural Engine](https://developer.apple.com/machine-learning/core-ml/) |
| 320 | + |
| 321 | +--- |
| 322 | + |
| 323 | +## ✨ Credits |
| 324 | + |
| 325 | +- **FluidAudio Team**: For excellent CoreML speech-to-text SDK |
| 326 | +- **Tauri Team**: For robust sidecar support in v2 |
| 327 | +- **VoiceTypr Community**: For testing and feedback |
| 328 | + |
| 329 | +--- |
| 330 | + |
| 331 | +**Status**: ✅ Implementation Complete | 🧪 Testing Required | 📦 Ready for Integration |
0 commit comments