feat: multi-format audio/video file support#18
Merged
Conversation
The method already supports all AVAudioFile formats (MP3, M4A, AIFF, FLAC, CAF), not just WAV. Rename reflects actual capability and prepares for multi-format file picker support.
loadAudioAsFloat32() tries AVAudioFile first (WAV, MP3, M4A, AIFF, FLAC, CAF), then falls back to AVAsset for video containers (MP4, MOV). resampleFile() is now async to support the AVAsset path. Architecture allows adding a future ffmpeg fallback for MKV/WebM/OGG.
Accept MP3, M4A, AIFF, FLAC, MP4, MOV in addition to WAV. Panel title and menu label updated to "Audio/Video Files".
Tests for loadAudioAsFloat32 (WAV round-trip, nonexistent file error) and async resampleFile round-trip (48kHz → 16kHz).
Generated fixtures via ffmpeg: sine_440hz.mp3, sine_440hz.m4a, sine_440hz.mp4 (video+audio), video_no_audio.mp4 (video only). Tests cover: MP3/M4A via AVAudioFile, MP4 via AVAsset fallback, video without audio track throws noAudioTrack, and MP3 resample.
…arallelize resampling - loadAudioAsFloat32: reuse AVAudioFile instance via readSamplesFromAudioFile helper instead of opening the file twice; log error on fallback - loadAudioFromAVAsset: pre-allocate sample array from asset duration - PipelineQueue: resample app and mic tracks concurrently with async let
Full pipeline tests: load multi-format file → resampleFile (16kHz) → WhisperKit transcribe → verify transcript is non-empty and has >50 chars. Each format tested against WAV baseline to confirm the pipeline works end-to-end with compressed audio and video containers. Fixtures generated from two_speakers_de.wav via ffmpeg (64kbps). Skipped in CI (requires WhisperKit model download).
Consolidated 3 separate E2E tests into one that loads the model once and transcribes WAV, MP3, M4A, MP4 in sequence. Each transcript is checked for at least 3 of 5 expected German keywords from the TTS fixture (willkommen, Projekt, Status, Entwicklung, Zeitplan).
9c6cc27 to
80cb67a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
loadAudioAsFloat32(url:)tries AVAudioFile (WAV, MP3, M4A, AIFF, FLAC, CAF) then falls back to AVAsset for video containers (MP4, MOV)async letPrepares for a future ffmpeg CLI fallback (MKV/WebM/OGG) — the
loadAudioAsFloat32architecture is designed for a third fallback tier.Test plan