v1.0.5

Latest

Latest

kodxana released this 24 Feb 10:57

· 1 commit to main since this release

8297886

v1.0.4

Docker Base Image Migration

Migrated from runpod/base:0.6.2-cuda12.4.1 (Ubuntu 22.04) to
nvidia/cuda:12.8.0-cudnn-runtime-ubuntu24.04
Python 3.12, FFmpeg 6.1, CUDA 12.8 — all matching PyTorch cu128 wheels exactly

Base64 Audio Input

audio_file now accepts base64-encoded audio data in addition to URLs
Supports raw base64 and data URI format (data:audio/wav;base64,...)
No need to host files externally for small audio inputs

HF_TOKEN Environment Variable Support

Diarization now automatically uses the HF_TOKEN endpoint env var
No need to pass huggingface_access_token in every request
Per-request token still works as an override

Lazy Model Loading

Speaker verification and diarization models only load when needed
Basic transcription works without HF_TOKEN or gated model access
Faster cold starts for transcription-only workloads

Fixes

Fixed pyannote.audio 4.x compatibility (Inference now requires Model.from_pretrained)
Pinned torchcodec>=0.6,<0.8 for PyTorch 2.8 compatibility
Permanently upgraded Lightning checkpoint during build to silence startup warning

Assets 2