A real-time voice assistant that connects Google Gemini's Live API to a LiveKit room — talk to Gemini with ultra-low latency, directly in your browser or any LiveKit-compatible client.
This quick-start project wires together LiveKit Agents and Google Gemini's multimodal live (realtime) model so you can hold a natural voice conversation with a Gemini-powered assistant in just a few lines of Python.
- Real-time voice conversation: Sub-second, streaming audio exchange with Gemini using the
gemini-3.1-flash-live-previewrealtime model - LiveKit-native transport: Runs inside a LiveKit room — works with any LiveKit-compatible frontend, mobile app, or the hosted LiveKit Playground
- Pluggable voice: Powered by the
Zephyrvoice by default; swap for any Gemini-supported voice in a single line - Minimal boilerplate: The entire agent fits in one
main.py— easy to extend with tools, guardrails, or additional logic - Auto-reconnect & room lifecycle: LiveKit Agents framework handles participant joining/leaving, reconnections, and graceful shutdown
- Python 3.11+: Core programming language
- LiveKit Agents (
livekit-agents[google,images]~=1.4): Agent framework and WebRTC transport - Google Gemini Live API (
google-genai>=1.16.0): Multimodal realtime language model - LiveKit Cloud / Self-hosted: Managed media server for the WebRTC room
- python-dotenv: Environment variable management
User microphone
│
▼
LiveKit Room ──► LiveKit Agent (Python)
│
▼
Google Gemini Realtime API
(gemini-3.1-flash-live-preview)
│
▼
Synthesized audio response
│
▼
LiveKit Room ──► User speaker
- A participant joins a LiveKit room (via the Playground or your own frontend).
- The LiveKit Agent framework invokes the
entrypointfunction and creates anAgentSession. - Audio is streamed in real time to Gemini's Live API, which returns streamed audio back.
- The agent plays the response back into the room as the participant hears it.
- Python 3.11 or higher
- uv (recommended) or pip
- API keys for:
- LiveKit Cloud (or a self-hosted LiveKit server) — for
LIVEKIT_URL,LIVEKIT_API_KEY, andLIVEKIT_API_SECRET - Google AI Studio — for
GOOGLE_API_KEY
- LiveKit Cloud (or a self-hosted LiveKit server) — for
Copy the example below into a .env file in the project directory:
LIVEKIT_URL=wss://<your-livekit-project>.livekit.cloud
LIVEKIT_API_KEY=<your-livekit-api-key>
LIVEKIT_API_SECRET=<your-livekit-api-secret>
GOOGLE_API_KEY=<your-google-ai-studio-api-key>You can also use a
.env.localfile — it takes precedence over.env.
-
Clone the repository:
git clone https://github.com/Arindam200/awesome-llm-apps.git cd awesome-llm-apps/voice_agents/livekit_gemini_agents -
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate
-
Install dependencies:
Using
uv(recommended):uv sync
Using
pip:pip install -e .
python main.py startThe agent registers itself with your LiveKit server and waits for participants.
Open the LiveKit Playground and enter your LiveKit URL + credentials to join the room as a participant. Once you're connected, the agent joins automatically and you can start talking.
python main.py devlivekit_gemini_agents/
├── main.py # Agent definition and entry point
├── pyproject.toml # Project metadata and dependencies
├── .env # Environment variables (never commit this)
└── README.md # This file
| What to change | Where |
|---|---|
| System prompt / personality | INSTRUCTIONS constant in main.py |
| Gemini model | REALTIME_MODEL constant in main.py |
| Voice | VOICE constant in main.py (e.g., "Puck", "Charon", "Kore") |
| Add tools | Override methods on VoiceAgent or pass tools= to AgentSession |
Contributions are welcome! Please feel free to submit a Pull Request. See CONTRIBUTING.md for details.
This project is licensed under the MIT License — see the LICENSE file for details.
- LiveKit Agents for the real-time agent framework
- Google Gemini Live API for the multimodal realtime model