feat: multi-format audio/video file support by pasrom · Pull Request #18 · pasrom/meeting-transcriber

pasrom · 2026-03-15T07:44:11Z

Summary

Multi-format loading: loadAudioAsFloat32(url:) tries AVAudioFile (WAV, MP3, M4A, AIFF, FLAC, CAF) then falls back to AVAsset for video containers (MP4, MOV)
File picker expanded: "Process Audio/Video Files..." now accepts MP3, M4A, AIFF, FLAC, MP4, MOV (was WAV-only)
Async resampleFile: supports the AVAsset path; dual-source resampling parallelized via async let
Performance: eliminated double file open in SpeakerNamingView, pre-allocated AVAsset buffer, zero-copy CMBlockBuffer reads

Prepares for a future ffmpeg CLI fallback (MKV/WebM/OGG) — the loadAudioAsFloat32 architecture is designed for a third fallback tier.

Test plan

32 AudioMixer tests (multi-format loading, AVAsset fallback, resampling from 44.1kHz fixtures)
E2E transcription tests for WAV, MP3, M4A, MP4 with keyword verification (skipped in CI — needs WhisperKit model)
Manual: "Process Audio/Video Files..." → select MP3/M4A/MP4 → verify transcription completes
Manual: select MP4 video → verify audio extraction + transcription works

The method already supports all AVAudioFile formats (MP3, M4A, AIFF, FLAC, CAF), not just WAV. Rename reflects actual capability and prepares for multi-format file picker support.

loadAudioAsFloat32() tries AVAudioFile first (WAV, MP3, M4A, AIFF, FLAC, CAF), then falls back to AVAsset for video containers (MP4, MOV). resampleFile() is now async to support the AVAsset path. Architecture allows adding a future ffmpeg fallback for MKV/WebM/OGG.

Accept MP3, M4A, AIFF, FLAC, MP4, MOV in addition to WAV. Panel title and menu label updated to "Audio/Video Files".

Tests for loadAudioAsFloat32 (WAV round-trip, nonexistent file error) and async resampleFile round-trip (48kHz → 16kHz).

Generated fixtures via ffmpeg: sine_440hz.mp3, sine_440hz.m4a, sine_440hz.mp4 (video+audio), video_no_audio.mp4 (video only). Tests cover: MP3/M4A via AVAudioFile, MP4 via AVAsset fallback, video without audio track throws noAudioTrack, and MP3 resample.

…arallelize resampling - loadAudioAsFloat32: reuse AVAudioFile instance via readSamplesFromAudioFile helper instead of opening the file twice; log error on fallback - loadAudioFromAVAsset: pre-allocate sample array from asset duration - PipelineQueue: resample app and mic tracks concurrently with async let

Full pipeline tests: load multi-format file → resampleFile (16kHz) → WhisperKit transcribe → verify transcript is non-empty and has >50 chars. Each format tested against WAV baseline to confirm the pipeline works end-to-end with compressed audio and video containers. Fixtures generated from two_speakers_de.wav via ffmpeg (64kbps). Skipped in CI (requires WhisperKit model download).

Consolidated 3 separate E2E tests into one that loads the model once and transcribes WAV, MP3, M4A, MP4 in sequence. Each transcript is checked for at least 3 of 5 expected German keywords from the TTS fixture (willkommen, Projekt, Status, Entwicklung, Zeitplan).

pasrom added 8 commits March 15, 2026 08:47

refactor(app): rename loadWAVAsFloat32 to loadAudioFileAsFloat32

30c17fa

The method already supports all AVAudioFile formats (MP3, M4A, AIFF, FLAC, CAF), not just WAV. Rename reflects actual capability and prepares for multi-format file picker support.

feat(app): expand file picker to support audio and video formats

c9783e6

Accept MP3, M4A, AIFF, FLAC, MP4, MOV in addition to WAV. Panel title and menu label updated to "Audio/Video Files".

test(app): add tests for multi-format audio loading

8bf35fc

Tests for loadAudioAsFloat32 (WAV round-trip, nonexistent file error) and async resampleFile round-trip (48kHz → 16kHz).

pasrom force-pushed the feat/multi-format-audio-support branch from 9c6cc27 to 80cb67a Compare March 15, 2026 07:49

pasrom merged commit 6646eb5 into main Mar 15, 2026
2 checks passed

pasrom deleted the feat/multi-format-audio-support branch March 15, 2026 07:52

pasrom added the enhancement New feature or request label Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: multi-format audio/video file support#18

feat: multi-format audio/video file support#18
pasrom merged 8 commits intomainfrom
feat/multi-format-audio-support

pasrom commented Mar 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pasrom commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pasrom commented Mar 15, 2026 •

edited

Loading