feat(vad): bundle optimized silero vad and deprecate the plugin by chenghao-mou · Pull Request #5800 · livekit/agents

chenghao-mou · 2026-05-21T17:57:11Z

Why

Silero VAD is the default endpointing implementation for voice agents, but lived behind a separate livekit-plugins-silero install step. That extra hop made the standard quickstart longer than it needed to be, and the plugin's onnxruntime-based loader paid the full model load cost in every job process (no fork-time sharing).

This PR moves Silero VAD into livekit-agents core, backed by livekit-local-inference. The plugin stays installable as a deprecated shim until v2.0, and existing call sites continue to work — they transparently route to the new implementation when settings are compatible.

This PR also introduces changes to follow the official silero settings, similar to #5788:

removed exp filter
changed the default min_silence_duration from 0.55s to 0.1s.

Code example

Before

from livekit.agents import Agent, AgentSession, JobContext, JobProcess, WorkerOptions, cli
from livekit.plugins import deepgram, openai, silero


def prewarm(proc: JobProcess) -> None:
    # Heavy ONNX session construction — must live behind prewarm so each
    # job process doesn't pay the load on every conversation start.
    proc.userdata["vad"] = silero.VAD.load()


async def entrypoint(ctx: JobContext) -> None:
    session = AgentSession(
        vad=ctx.proc.userdata["vad"],
        stt=deepgram.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
    )
    await session.start(agent=Agent(instructions="..."), room=ctx.room)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))

After

from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli, inference
from livekit.plugins import deepgram, openai


async def entrypoint(ctx: JobContext) -> None:
    session = AgentSession(
        vad=inference.VAD(model="silero"),
        stt=deepgram.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
    )
    await session.start(agent=Agent(instructions="..."), room=ctx.room)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

No prewarm_fnc, no silero plugin import, no proc.userdata shuttle. Weights are loaded once in the forkserver and inherited by every job process via COW.

API change

Before	After
`pip install livekit-agents livekit-plugins-silero`	`pip install livekit-agents` — Silero is bundled
`from livekit.plugins.silero import VAD` `vad = VAD.load(min_silence_duration=0.4)`	`from livekit.agents import inference` `vad = inference.VAD(model="silero", min_silence_duration=0.4)`
`silero.VAD.load()` did a heavy onnxruntime session construction → expected to live behind a `prewarm` hook	`inference.VAD(model="silero")` is a cheap wrapper; weights are loaded once at forkserver-preload time, inherited via COW
Per-job: ~6 MB Silero ONNX loaded into every job process	Per-fork: weights resident in the forkserver, COW-shared with each job (Linux); spawn platforms unchanged
Plugin owned its own `VAD`/`VADStream`/`OnnxModel` (~650 LOC)	Core owns the wrapper; plugin keeps a frozen copy as a deprecation shim
`silero.VAD.load(force_cpu=False, sample_rate=16000)` ran onnxruntime; with custom `onnx_file_path`, used a user-supplied model	`silero.VAD.load(...)` transparently delegates to `inference.VAD(model="silero", ...)` when settings are compatible; 8 kHz + `onnx_file_path` still routes to the legacy onnxruntime path
`vad: NotGivenOr[vad.VAD] = NOT_GIVEN` in `AgentSession.__init__` — `vad=None` was illegal per type, even though the code accepted it	`vad: NotGivenOr[vad.VAD \| None] = NOT_GIVEN` — `vad=None` now type-legal as an explicit "no VAD" signal
No way to invoke Silero VAD without importing the plugin	`from livekit.agents.inference import VAD`

Behaviour change

Before	After
Worker startup: each forked job pays the full Silero ONNX load (~tens of ms)	Forkserver preload runs `livekit.agents.inference._warmup` once → `init_vad()` + `init_eot()` page native weights into the forkserver. Jobs fork with weights already resident.
`import livekit.plugins.silero` — silent	Emits a single `DeprecationWarning` pointing to `inference.VAD(model="silero")`; v2.0 removal target
`silero.VAD.load(force_cpu=False)` honored the user's GPU request via onnxruntime	When delegating to native, `force_cpu=False` is ignored (native lib is CPU-only) → now emits a `WARNING` explaining the kwarg is ignored and pointing at `onnx_file_path` as the legacy escape hatch
AgentSession with default args → `session.vad is None`	No change on this branch — default stays `None`.

Migration

If you currently use…	Do nothing?	Recommended update
`silero.VAD.load()` with default settings	Works (delegates) — deprecation warning prints once at plugin import	`from livekit.agents import inference; inference.VAD(model="silero")`
`silero.VAD.load(sample_rate=8000)`	Works — still routes to legacy onnxruntime	No native equivalent; stay on plugin until v2.0 then migrate if 16 kHz is acceptable
`silero.VAD.load(onnx_file_path=...)`	Works — still routes to legacy onnxruntime	Stay on plugin until v2.0; the bundled native model is fixed
`silero.VAD.load(force_cpu=False)`	Works but a warning now fires noting `force_cpu` is ignored on the delegated native path	Use `inference.VAD(model="silero")` (and accept CPU) or keep the plugin form with `onnx_file_path=...` to keep the legacy path

theomonnom · 2026-05-21T21:27:47Z

+        *,
+        model: VADModels = "silero",
+        min_speech_duration: float = 0.05,
+        min_silence_duration: float = 0.55,


Should we follow the same defaults as Silero? IMO we should use 0.1 now. It shouldn't have any side effect and it is closer to the truth

I'd love to, but I wasn't sure if you are going to merge #5788 first.

just closed mine

longcw · 2026-05-22T01:48:03Z

tested locally and it works well. could you add a comparison in the pr description?

devin-ai-integration

Devin Review found 1 new potential issue.

View 13 additional findings in Devin Review.

devin-ai-integration · 2026-05-22T11:12:57Z

+        from .vad import VAD
+
+        vad_instance = VAD(model="silero")


🟡 Auto-created VAD for Speechmatics STT uses drastically lower min_silence_duration default (0.1s vs 0.55s)

In _resolve_vad_for_model, the auto-created VAD for Speechmatics STT models changed from SileroVAD.load() (which had min_silence_duration=0.55) to inference.VAD(model="silero") (which defaults to min_silence_duration=0.1). This 5.5× reduction means END_OF_SPEECH events fire much sooner for Speechmatics STT users who don't provide their own VAD. While the audio EOT detector (_maybe_apply_vad_silence_override in livekit-agents/livekit/agents/voice/audio_recognition.py:669) bumps this to at least 0.25s when the audio turn detector is active, users with text-based turn detectors (e.g., MultilingualModel) will experience the full 0.1s default — significantly different from the previous 0.55s behavior.

Suggested change

from .vad import VAD

vad_instance = VAD(model="silero")

from .vad import VAD

vad_instance = VAD(model="silero", min_silence_duration=0.55)

Was this helpful? React with 👍 or 👎 to provide feedback.

- Add the compiled silero vad from livekit-local-inference; expose as inference.VAD(model="silero"). - Forkserver preload uses livekit.agents.inference._warmup as a side-effect module that calls init_vad() and init_eot() in the forkserver process so forked jobs inherit weight pages via COW. - Drop the prewarm vad pattern from examples; inline construction in AgentSession is now the recommended form. - Update tests for the new vad min_silence default (0.25s) and log message. Squashed from chenghao/feat/inline-silero-vad rebase onto feat/AGT-2520-multimodal-EOU. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 15 additional findings in Devin Review.

devin-ai-integration · 2026-05-23T00:45:43Z

    "livekit-local-inference>=0.2.5",
    "livekit-protocol>=1.1.9,<2",
    "livekit-blingfire~=1.1,<2",
+    "livekit-local-inference>=0.2.5",


🟡 Duplicate livekit-local-inference dependency entry in pyproject.toml

This PR adds a second "livekit-local-inference>=0.2.5" entry on line 37, while one already exists on line 34. While most Python build tools deduplicate dependencies gracefully, this is clearly unintentional and could cause confusion or subtle issues with tooling that doesn't handle duplicates.

Suggested change

"livekit-local-inference>=0.2.5",

Was this helpful? React with 👍 or 👎 to provide feedback.

This comment was marked as resolved.

Sign in to view

theomonnom reviewed May 21, 2026

View reviewed changes

Comment thread livekit-agents/livekit/agents/worker.py Outdated

theomonnom reviewed May 21, 2026

View reviewed changes

longcw approved these changes May 22, 2026

View reviewed changes

chenghao-mou added the needs-documentation label May 22, 2026

devin-ai-integration Bot reviewed May 22, 2026

View reviewed changes

theomonnom approved these changes May 22, 2026

View reviewed changes

chenghao-mou force-pushed the chenghao/feat/inline-silero-vad branch from 0526efb to 122cd9c Compare May 23, 2026 00:40

devin-ai-integration Bot reviewed May 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vad): bundle optimized silero vad and deprecate the plugin#5800

feat(vad): bundle optimized silero vad and deprecate the plugin#5800
chenghao-mou wants to merge 1 commit into
feat/AGT-2520-multimodal-EOUfrom
chenghao/feat/inline-silero-vad

chenghao-mou commented May 21, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

theomonnom May 21, 2026

Uh oh!

chenghao-mou May 22, 2026

Uh oh!

theomonnom May 22, 2026

Uh oh!

longcw commented May 22, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 22, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chenghao-mou commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Code example

API change

Behaviour change

Migration

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

theomonnom May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chenghao-mou May 22, 2026

Choose a reason for hiding this comment

Uh oh!

theomonnom May 22, 2026

Choose a reason for hiding this comment

Uh oh!

longcw commented May 22, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chenghao-mou commented May 21, 2026 •

edited

Loading