Releases
v1.0.5
Compare
Sorry, something went wrong.
No results found
v1.0.4
Docker Base Image Migration
Migrated from runpod/base:0.6.2-cuda12.4.1 (Ubuntu 22.04) to
nvidia/cuda:12.8.0-cudnn-runtime-ubuntu24.04
Python 3.12, FFmpeg 6.1, CUDA 12.8 — all matching PyTorch cu128 wheels exactly
Base64 Audio Input
audio_file now accepts base64-encoded audio data in addition to URLs
Supports raw base64 and data URI format (data:audio/wav;base64,...)
No need to host files externally for small audio inputs
HF_TOKEN Environment Variable Support
Diarization now automatically uses the HF_TOKEN endpoint env var
No need to pass huggingface_access_token in every request
Per-request token still works as an override
Lazy Model Loading
Speaker verification and diarization models only load when needed
Basic transcription works without HF_TOKEN or gated model access
Faster cold starts for transcription-only workloads
Fixes
Fixed pyannote.audio 4.x compatibility (Inference now requires Model.from_pretrained)
Pinned torchcodec>=0.6,<0.8 for PyTorch 2.8 compatibility
Permanently upgraded Lightning checkpoint during build to silence startup warning
You can’t perform that action at this time.