Skip to content

Latest commit

 

History

History
331 lines (255 loc) · 10.3 KB

File metadata and controls

331 lines (255 loc) · 10.3 KB

Parakeet Swift Integration - Implementation Summary

🎯 Overview

Successfully replaced the 123MB Python/MLX Parakeet sidecar with a 1.2MB Swift/FluidAudio implementation that provides native macOS transcription using Apple Neural Engine.

⚠️ Platform Support: This integration is macOS-only. Windows and Linux will continue to use Whisper models exclusively.


✅ What Was Implemented

1. Swift Sidecar (/sidecar/parakeet-swift/)

Files Created:

  • Sources/main.swift - Main sidecar logic with FluidAudio integration
  • Package.swift - Swift package configuration
  • build.sh - Automated build script with proper target triple naming
  • README.md - Comprehensive documentation
  • .gitignore - Git ignore rules for build artifacts

Features:

  • ✅ JSON-based communication protocol (stdin/stdout)
  • ✅ Commands: load_model, transcribe, delete_model, unload_model, status
  • ✅ FluidAudio SDK integration (v0.5.2)
  • ✅ Apple Neural Engine acceleration
  • ✅ Proper error handling and status responses
  • ✅ Model caching managed by FluidAudio

2. Rust Backend Integration

Modified Files:

  • src-tauri/src/parakeet/manager.rs - Delegates to Swift sidecar
  • src-tauri/src/parakeet/messages.rs - Added DeleteModel command
  • src-tauri/src/commands/reset.rs - Clears FluidAudio cache
  • src-tauri/src/commands/model.rs - Unified model management
  • src-tauri/build.rs - Automatically builds Swift sidecar
  • src-tauri/tauri.conf.json - Updated externalBin path

Key Improvements:

  • ✅ Proper model availability checking via FluidAudio cache
  • ✅ Health check function for sidecar verification
  • ✅ Download/delete operations delegate to Swift
  • ✅ Reset App Data clears all FluidAudio cached files

3. Build System

Automated Build Process:

  1. pnpm tauri dev or pnpm tauri build
  2. → Triggers src-tauri/build.rs
  3. → Runs sidecar/parakeet-swift/build.sh
  4. → Produces dist/parakeet-sidecar-aarch64-apple-darwin
  5. → Tauri bundles it automatically

Target Triple Handling:

  • macOS ARM64: aarch64-apple-darwin
  • macOS Intel: x86_64-apple-darwin
  • Future: Linux/Windows targets configurable

📊 Benefits

Metric Old (Python/MLX) New (Swift/FluidAudio) Improvement
Binary Size 123 MB 1.2 MB 99% smaller
Download Size 123 MB + 500 MB models 1.2 MB + 500 MB models Same models, tiny binary
Performance MLX (CPU/GPU) Apple Neural Engine Native acceleration
User Control Auto-download User clicks Download Better UX
macOS Integration Python runtime Native Swift Fully native

🔄 Data Flow

Download Flow

1. User clicks "Download" in Settings
2. Frontend → Rust: download_model(model_name)
3. Rust → Swift: {"type": "load_model", "model_id": "..."}
4. Swift → FluidAudio: AsrModels.downloadAndLoad()
5. FluidAudio downloads CoreML to ~/Library/Application Support/
6. Swift → Rust: {"type": "status", "loaded_model": "..."}
7. Rust → Frontend: model-downloaded event

Transcription Flow

1. User records audio
2. Frontend → Rust: transcribe(audio_path)
3. Rust → Swift: {"type": "transcribe", "audio_path": "..."}
4. Swift → FluidAudio: asrManager.transcribe(fileURL)
5. FluidAudio uses Apple Neural Engine
6. Swift → Rust: {"type": "transcription", "text": "..."}
7. Rust → Frontend: Insert text at cursor

Delete Flow

1. User clicks "Remove" in Settings
2. Frontend → Rust: delete_model(model_name)
3. Rust → Swift: {"type": "delete_model"}
4. Swift deletes:
   - ~/Library/Application Support/FluidAudio/
   - ~/Library/Application Support/parakeet-tdt-0.6b-v3-coreml/
   - ~/Library/Caches/FluidAudio/
5. Swift → Rust: {"type": "status", "loaded_model": null}
6. Rust → Frontend: model-deleted event

Reset App Data Flow

1. User clicks "Reset App Data"
2. Frontend → Rust: reset_app_data()
3. Rust clears:
   - FluidAudio cache directories
   - Old Parakeet tracking dirs
   - Tauri stores (settings, transcriptions)
   - Secure store (API keys)
   - System preferences
4. Rust → Frontend: reset-complete event

🧪 Testing Checklist

Manual Testing Required

  • Build Test: pnpm tauri dev compiles Swift sidecar
  • Health Check: App starts without sidecar errors
  • Download: Click Download, verify ~500MB CoreML downloads
  • Status Check: Downloaded model shows as available
  • Transcription: Record audio, verify transcription works
  • Quality: Check transcription accuracy vs Whisper
  • Delete: Click Remove, verify files deleted from disk
  • Re-download: Download again after delete
  • Reset App Data: Verify all Parakeet data cleared
  • Persistence: Model selection survives app restart

Automated Tests Needed (Future)

// Integration test idea
#[tokio::test]
async fn test_parakeet_sidecar_communication() {
    let app = test_app();
    let manager = ParakeetManager::new(temp_dir());
    
    // Health check
    assert!(manager.health_check(&app).await.is_ok());
    
    // Status check
    let response = manager.client.send(&app, &ParakeetCommand::Status {}).await.unwrap();
    assert!(matches!(response, ParakeetResponse::Status { .. }));
}

🚨 Known Limitations

Platform Limitations

  1. macOS Only: Swift/FluidAudio is macOS-exclusive (by design)

    • Backend: Returns empty Parakeet model list on Windows/Linux
    • Frontend: Dynamically detects engine from selected model
    • Windows/Linux: Only Whisper models appear in UI
    • Future: May add Windows-specific native models if available
  2. Model Availability Heuristic:

    • Currently checks if FluidAudio cache directories exist
    • Not 100% accurate if user manually deletes files
    • Improvement: Query sidecar status on app startup
  3. No Progress for Model Download:

    • FluidAudio doesn't expose download progress
    • UI shows indeterminate spinner
    • User must wait ~2-5 minutes for 500MB download
  4. Single Model Support:

    • Only Parakeet TDT 0.6B v3 currently available
    • FluidAudio may support more models in future

Future Improvements

  • Expose FluidAudio download progress (if SDK adds support)
  • Add proper model availability query on startup
  • Support multiple Parakeet model variants
  • Add offline mode detection (warn if no internet for download)
  • Implement model update mechanism

📝 Files Modified

New Files

sidecar/parakeet-swift/Sources/main.swift
sidecar/parakeet-swift/Package.swift
sidecar/parakeet-swift/build.sh
sidecar/parakeet-swift/README.md
sidecar/parakeet-swift/.gitignore
PARAKEET_SWIFT_INTEGRATION.md (this file)

Modified Files

src-tauri/build.rs
src-tauri/tauri.conf.json
src-tauri/src/parakeet/manager.rs (macOS-only logic added)
src-tauri/src/parakeet/models.rs (removed V2, macOS-only)
src-tauri/src/parakeet/messages.rs
src-tauri/src/commands/reset.rs
src-tauri/src/commands/model.rs
src/components/onboarding/OnboardingDesktop.tsx (dynamic engine detection)

Unchanged (Already Configured)

src-tauri/src/parakeet/sidecar.rs (communication logic)
src-tauri/capabilities/macos.json (sidecar permissions)
src-tauri/capabilities/default.json (sidecar permissions)

🎓 Lessons Learned

Tauri v2 Sidecar Best Practices

  1. Binary Naming: Must follow binary-name-$TARGET_TRIPLE format

    • Example: parakeet-sidecar-aarch64-apple-darwin
    • Tauri automatically appends target triple when spawning
  2. externalBin Path: Points to base name WITHOUT target triple

    • ✅ Correct: "../sidecar/parakeet-swift/dist/parakeet-sidecar"
    • ❌ Wrong: "../sidecar/parakeet-swift/dist/parakeet-sidecar-aarch64-apple-darwin"
  3. Build Integration: Use build.rs for automated compilation

    • Runs before Tauri build
    • Gracefully handles build failures
    • Supports incremental builds
  4. Permissions: Configure in capabilities/*.json

    • shell:allow-spawn for launching sidecar
    • shell:allow-stdin-write for sending commands
  5. Communication: JSON over stdin/stdout is reliable

    • Use line-delimited JSON
    • Always flush stdout after writing
    • Handle stderr for debugging

Swift/FluidAudio Specifics

  1. Package Management: Swift Package Manager is straightforward

    • Dependencies resolve automatically
    • Release builds are optimized and small
  2. FluidAudio SDK: v0.5.2 is stable

    • Requires macOS 13.0+
    • Handles model caching automatically
    • Returns simple ASRResult struct
  3. JSON Serialization: Swift Codable is powerful

    • Use CodingKeys enum for snake_case conversion
    • Default values in structs don't decode (use initializers)

🚀 Next Steps

Immediate (Before Release)

  1. Test End-to-End Flow

    pnpm tauri dev
    # → Test: Download → Transcribe → Remove → Reset
  2. Verify Build Process

    pnpm tauri build
    # → Ensure sidecar is bundled in .app
  3. Check Binary Signing (for distribution)

    • Swift binary must be code-signed
    • Include in notarization process

Future Enhancements

  1. Universal Binary: Build for both ARM64 and Intel

    # In build.sh, support lipo for universal binaries
    swift build -c release --arch arm64 --arch x86_64
  2. Model Selection: Add UI for multiple Parakeet models

    • Query FluidAudio for available models
    • Let user choose between speed/accuracy tradeoffs
  3. Offline Support: Detect network issues

    • Show clear error if download fails
    • Suggest downloading when connected
  4. Performance Monitoring: Track transcription metrics

    • Time to transcribe
    • Model load time
    • Memory usage

📚 References


✨ Credits

  • FluidAudio Team: For excellent CoreML speech-to-text SDK
  • Tauri Team: For robust sidecar support in v2
  • VoiceTypr Community: For testing and feedback

Status: ✅ Implementation Complete | 🧪 Testing Required | 📦 Ready for Integration