Skip to content

feat(vad): bundle optimized silero vad and deprecate the plugin#5800

Open
chenghao-mou wants to merge 1 commit into
feat/AGT-2520-multimodal-EOUfrom
chenghao/feat/inline-silero-vad
Open

feat(vad): bundle optimized silero vad and deprecate the plugin#5800
chenghao-mou wants to merge 1 commit into
feat/AGT-2520-multimodal-EOUfrom
chenghao/feat/inline-silero-vad

Conversation

@chenghao-mou
Copy link
Copy Markdown
Member

@chenghao-mou chenghao-mou commented May 21, 2026

Why

Silero VAD is the default endpointing implementation for voice agents, but lived behind a separate livekit-plugins-silero install step. That extra hop made the standard quickstart longer than it needed to be, and the plugin's onnxruntime-based loader paid the full model load cost in every job process (no fork-time sharing).

This PR moves Silero VAD into livekit-agents core, backed by livekit-local-inference. The plugin stays installable as a deprecated shim until v2.0, and existing call sites continue to work — they transparently route to the new implementation when settings are compatible.

This PR also introduces changes to follow the official silero settings, similar to #5788:

  • removed exp filter
  • changed the default min_silence_duration from 0.55s to 0.1s.

Code example

Before

from livekit.agents import Agent, AgentSession, JobContext, JobProcess, WorkerOptions, cli
from livekit.plugins import deepgram, openai, silero


def prewarm(proc: JobProcess) -> None:
    # Heavy ONNX session construction — must live behind prewarm so each
    # job process doesn't pay the load on every conversation start.
    proc.userdata["vad"] = silero.VAD.load()


async def entrypoint(ctx: JobContext) -> None:
    session = AgentSession(
        vad=ctx.proc.userdata["vad"],
        stt=deepgram.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
    )
    await session.start(agent=Agent(instructions="..."), room=ctx.room)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))

After

from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli, inference
from livekit.plugins import deepgram, openai


async def entrypoint(ctx: JobContext) -> None:
    session = AgentSession(
        vad=inference.VAD(model="silero"),
        stt=deepgram.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
    )
    await session.start(agent=Agent(instructions="..."), room=ctx.room)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

No prewarm_fnc, no silero plugin import, no proc.userdata shuttle. Weights are loaded once in the forkserver and inherited by every job process via COW.

API change

Before After
pip install livekit-agents livekit-plugins-silero pip install livekit-agents — Silero is bundled
from livekit.plugins.silero import VAD
vad = VAD.load(min_silence_duration=0.4)
from livekit.agents import inference
vad = inference.VAD(model="silero", min_silence_duration=0.4)
silero.VAD.load() did a heavy onnxruntime session construction → expected to live behind a prewarm hook inference.VAD(model="silero") is a cheap wrapper; weights are loaded once at forkserver-preload time, inherited via COW
Per-job: ~6 MB Silero ONNX loaded into every job process Per-fork: weights resident in the forkserver, COW-shared with each job (Linux); spawn platforms unchanged
Plugin owned its own VAD/VADStream/OnnxModel (~650 LOC) Core owns the wrapper; plugin keeps a frozen copy as a deprecation shim
silero.VAD.load(force_cpu=False, sample_rate=16000) ran onnxruntime; with custom onnx_file_path, used a user-supplied model silero.VAD.load(...) transparently delegates to inference.VAD(model="silero", ...) when settings are compatible; 8 kHz + onnx_file_path still routes to the legacy onnxruntime path
vad: NotGivenOr[vad.VAD] = NOT_GIVEN in AgentSession.__init__vad=None was illegal per type, even though the code accepted it vad: NotGivenOr[vad.VAD | None] = NOT_GIVENvad=None now type-legal as an explicit "no VAD" signal
No way to invoke Silero VAD without importing the plugin from livekit.agents.inference import VAD

Behaviour change

Before After
Worker startup: each forked job pays the full Silero ONNX load (~tens of ms) Forkserver preload runs livekit.agents.inference._warmup once → init_vad() + init_eot() page native weights into the forkserver. Jobs fork with weights already resident.
import livekit.plugins.silero — silent Emits a single DeprecationWarning pointing to inference.VAD(model="silero"); v2.0 removal target
silero.VAD.load(force_cpu=False) honored the user's GPU request via onnxruntime When delegating to native, force_cpu=False is ignored (native lib is CPU-only) → now emits a WARNING explaining the kwarg is ignored and pointing at onnx_file_path as the legacy escape hatch
AgentSession with default args → session.vad is None No change on this branch — default stays None.

Migration

If you currently use… Do nothing? Recommended update
silero.VAD.load() with default settings Works (delegates) — deprecation warning prints once at plugin import from livekit.agents import inference; inference.VAD(model="silero")
silero.VAD.load(sample_rate=8000) Works — still routes to legacy onnxruntime No native equivalent; stay on plugin until v2.0 then migrate if 16 kHz is acceptable
silero.VAD.load(onnx_file_path=...) Works — still routes to legacy onnxruntime Stay on plugin until v2.0; the bundled native model is fixed
silero.VAD.load(force_cpu=False) Works but a warning now fires noting force_cpu is ignored on the delegated native path Use inference.VAD(model="silero") (and accept CPU) or keep the plugin form with onnx_file_path=... to keep the legacy path

devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread livekit-agents/livekit/agents/worker.py Outdated
*,
model: VADModels = "silero",
min_speech_duration: float = 0.05,
min_silence_duration: float = 0.55,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we follow the same defaults as Silero? IMO we should use 0.1 now. It shouldn't have any side effect and it is closer to the truth

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to, but I wasn't sure if you are going to merge #5788 first.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just closed mine

@longcw
Copy link
Copy Markdown
Contributor

longcw commented May 22, 2026

tested locally and it works well. could you add a comparison in the pr description?

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 13 additional findings in Devin Review.

Open in Devin Review

Comment on lines +221 to +223
from .vad import VAD

vad_instance = VAD(model="silero")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Auto-created VAD for Speechmatics STT uses drastically lower min_silence_duration default (0.1s vs 0.55s)

In _resolve_vad_for_model, the auto-created VAD for Speechmatics STT models changed from SileroVAD.load() (which had min_silence_duration=0.55) to inference.VAD(model="silero") (which defaults to min_silence_duration=0.1). This 5.5× reduction means END_OF_SPEECH events fire much sooner for Speechmatics STT users who don't provide their own VAD. While the audio EOT detector (_maybe_apply_vad_silence_override in livekit-agents/livekit/agents/voice/audio_recognition.py:669) bumps this to at least 0.25s when the audio turn detector is active, users with text-based turn detectors (e.g., MultilingualModel) will experience the full 0.1s default — significantly different from the previous 0.55s behavior.

Suggested change
from .vad import VAD
vad_instance = VAD(model="silero")
from .vad import VAD
vad_instance = VAD(model="silero", min_silence_duration=0.55)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

- Add the compiled silero vad from livekit-local-inference; expose as
  inference.VAD(model="silero").
- Forkserver preload uses livekit.agents.inference._warmup as a side-effect
  module that calls init_vad() and init_eot() in the forkserver process so
  forked jobs inherit weight pages via COW.
- Drop the prewarm vad pattern from examples; inline construction in
  AgentSession is now the recommended form.
- Update tests for the new vad min_silence default (0.25s) and log message.

Squashed from chenghao/feat/inline-silero-vad rebase onto feat/AGT-2520-multimodal-EOU.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chenghao-mou chenghao-mou force-pushed the chenghao/feat/inline-silero-vad branch from 0526efb to 122cd9c Compare May 23, 2026 00:40
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 15 additional findings in Devin Review.

Open in Devin Review

"livekit-local-inference>=0.2.5",
"livekit-protocol>=1.1.9,<2",
"livekit-blingfire~=1.1,<2",
"livekit-local-inference>=0.2.5",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Duplicate livekit-local-inference dependency entry in pyproject.toml

This PR adds a second "livekit-local-inference>=0.2.5" entry on line 37, while one already exists on line 34. While most Python build tools deduplicate dependencies gracefully, this is clearly unintentional and could cause confusion or subtle issues with tooling that doesn't handle duplicates.

Suggested change
"livekit-local-inference>=0.2.5",
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants