Skip to content

feat: multi-format audio/video file support#18

Merged
pasrom merged 8 commits intomainfrom
feat/multi-format-audio-support
Mar 15, 2026
Merged

feat: multi-format audio/video file support#18
pasrom merged 8 commits intomainfrom
feat/multi-format-audio-support

Conversation

@pasrom
Copy link
Copy Markdown
Owner

@pasrom pasrom commented Mar 15, 2026

Summary

  • Multi-format loading: loadAudioAsFloat32(url:) tries AVAudioFile (WAV, MP3, M4A, AIFF, FLAC, CAF) then falls back to AVAsset for video containers (MP4, MOV)
  • File picker expanded: "Process Audio/Video Files..." now accepts MP3, M4A, AIFF, FLAC, MP4, MOV (was WAV-only)
  • Async resampleFile: supports the AVAsset path; dual-source resampling parallelized via async let
  • Performance: eliminated double file open in SpeakerNamingView, pre-allocated AVAsset buffer, zero-copy CMBlockBuffer reads

Prepares for a future ffmpeg CLI fallback (MKV/WebM/OGG) — the loadAudioAsFloat32 architecture is designed for a third fallback tier.

Test plan

  • 32 AudioMixer tests (multi-format loading, AVAsset fallback, resampling from 44.1kHz fixtures)
  • E2E transcription tests for WAV, MP3, M4A, MP4 with keyword verification (skipped in CI — needs WhisperKit model)
  • Manual: "Process Audio/Video Files..." → select MP3/M4A/MP4 → verify transcription completes
  • Manual: select MP4 video → verify audio extraction + transcription works

pasrom added 8 commits March 15, 2026 08:47
The method already supports all AVAudioFile formats (MP3, M4A, AIFF,
FLAC, CAF), not just WAV. Rename reflects actual capability and
prepares for multi-format file picker support.
loadAudioAsFloat32() tries AVAudioFile first (WAV, MP3, M4A, AIFF,
FLAC, CAF), then falls back to AVAsset for video containers (MP4, MOV).
resampleFile() is now async to support the AVAsset path.
Architecture allows adding a future ffmpeg fallback for MKV/WebM/OGG.
Accept MP3, M4A, AIFF, FLAC, MP4, MOV in addition to WAV.
Panel title and menu label updated to "Audio/Video Files".
Tests for loadAudioAsFloat32 (WAV round-trip, nonexistent file error)
and async resampleFile round-trip (48kHz → 16kHz).
Generated fixtures via ffmpeg: sine_440hz.mp3, sine_440hz.m4a,
sine_440hz.mp4 (video+audio), video_no_audio.mp4 (video only).
Tests cover: MP3/M4A via AVAudioFile, MP4 via AVAsset fallback,
video without audio track throws noAudioTrack, and MP3 resample.
…arallelize resampling

- loadAudioAsFloat32: reuse AVAudioFile instance via readSamplesFromAudioFile
  helper instead of opening the file twice; log error on fallback
- loadAudioFromAVAsset: pre-allocate sample array from asset duration
- PipelineQueue: resample app and mic tracks concurrently with async let
Full pipeline tests: load multi-format file → resampleFile (16kHz) →
WhisperKit transcribe → verify transcript is non-empty and has >50 chars.
Each format tested against WAV baseline to confirm the pipeline works
end-to-end with compressed audio and video containers.

Fixtures generated from two_speakers_de.wav via ffmpeg (64kbps).
Skipped in CI (requires WhisperKit model download).
Consolidated 3 separate E2E tests into one that loads the model once
and transcribes WAV, MP3, M4A, MP4 in sequence. Each transcript is
checked for at least 3 of 5 expected German keywords from the TTS
fixture (willkommen, Projekt, Status, Entwicklung, Zeitplan).
@pasrom pasrom force-pushed the feat/multi-format-audio-support branch from 9c6cc27 to 80cb67a Compare March 15, 2026 07:49
@pasrom pasrom merged commit 6646eb5 into main Mar 15, 2026
2 checks passed
@pasrom pasrom deleted the feat/multi-format-audio-support branch March 15, 2026 07:52
@pasrom pasrom added the enhancement New feature or request label Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant