Skip to content

v1.0.5

Latest

Choose a tag to compare

@kodxana kodxana released this 24 Feb 10:57
· 1 commit to main since this release

v1.0.4

Docker Base Image Migration

  • Migrated from runpod/base:0.6.2-cuda12.4.1 (Ubuntu 22.04) to
    nvidia/cuda:12.8.0-cudnn-runtime-ubuntu24.04
  • Python 3.12, FFmpeg 6.1, CUDA 12.8 — all matching PyTorch cu128 wheels exactly

Base64 Audio Input

  • audio_file now accepts base64-encoded audio data in addition to URLs
  • Supports raw base64 and data URI format (data:audio/wav;base64,...)
  • No need to host files externally for small audio inputs

HF_TOKEN Environment Variable Support

  • Diarization now automatically uses the HF_TOKEN endpoint env var
  • No need to pass huggingface_access_token in every request
  • Per-request token still works as an override

Lazy Model Loading

  • Speaker verification and diarization models only load when needed
  • Basic transcription works without HF_TOKEN or gated model access
  • Faster cold starts for transcription-only workloads

Fixes

  • Fixed pyannote.audio 4.x compatibility (Inference now requires Model.from_pretrained)
  • Pinned torchcodec>=0.6,<0.8 for PyTorch 2.8 compatibility
  • Permanently upgraded Lightning checkpoint during build to silence startup warning