Overview
PraisonAI's praisonai realtime command currently launches a Chainlit-based voice UI that uses cl.on_audio_chunk + OpenAI's Realtime API (RealtimeClient) for bidirectional speech-to-speech. To complete the Chainlit → PraisonAIUI migration (MervinPraison/PraisonAI#1443), aiui needs a first-class Realtime voice feature with WebRTC-style bidirectional audio.
Current aiui state
praisonaiui/features/tts.py — one-way TTS only (OpenAI TTS + browser Web Speech API).
- No microphone input, no realtime audio streaming, no WebRTC.
Requested feature
A new feature analogous to tts.py — protocol-driven, lazy-loaded — that supports:
- Bidirectional audio: mic capture in the browser → streamed to backend → forwarded to OpenAI Realtime API (gpt-4o-realtime-preview) → audio chunks back to browser → playback.
- WebSocket or WebRTC transport (WebRTC preferred; OpenAI now supports direct ephemeral-token WebRTC sessions).
- Protocol:
RealtimeProtocol ABC with create_session(), send_audio(), receive_audio(), matching the pattern of TTSProtocol.
- Backends:
OpenAIRealtimeManager (primary), room for Anthropic/ElevenLabs later.
- Message integration: realtime transcripts surface as normal
RunEvent.RUN_CONTENT so existing Chat/Agents/Dashboard pages show the transcript in realtime.
- Tool-call support: OpenAI Realtime can call tools; events must bridge to
RunEventType.TOOL_CALL_STARTED/COMPLETED.
- Dashboard page:
realtime sidebar page with mic button, waveform, transcript panel.
Why this blocks PraisonAI
Without this, praisonai realtime has no aiui replacement and we cannot drop chainlit from praisonai[realtime] / praisonai[all]. See MervinPraison/PraisonAI#1443 Phase 1.
Acceptance criteria
References
Overview
PraisonAI's
praisonai realtimecommand currently launches a Chainlit-based voice UI that usescl.on_audio_chunk+ OpenAI's Realtime API (RealtimeClient) for bidirectional speech-to-speech. To complete the Chainlit → PraisonAIUI migration (MervinPraison/PraisonAI#1443),aiuineeds a first-class Realtime voice feature with WebRTC-style bidirectional audio.Current aiui state
praisonaiui/features/tts.py— one-way TTS only (OpenAI TTS + browser Web Speech API).Requested feature
A new feature analogous to
tts.py— protocol-driven, lazy-loaded — that supports:RealtimeProtocolABC withcreate_session(),send_audio(),receive_audio(), matching the pattern ofTTSProtocol.OpenAIRealtimeManager(primary), room for Anthropic/ElevenLabs later.RunEvent.RUN_CONTENTso existing Chat/Agents/Dashboard pages show the transcript in realtime.RunEventType.TOOL_CALL_STARTED/COMPLETED.realtimesidebar page with mic button, waveform, transcript panel.Why this blocks PraisonAI
Without this,
praisonai realtimehas no aiui replacement and we cannot dropchainlitfrompraisonai[realtime]/praisonai[all]. See MervinPraison/PraisonAI#1443 Phase 1.Acceptance criteria
aiui.set_realtime(OpenAIRealtimeManager())wires realtime voice@aiui.realtimedecorator (or similar) to customise per-session behaviourrealtimebuilt-in dashboard page renders mic + playback UIBaseProviderso Core SDK agents can be the realtime brainReferences
praisonai/ui/realtime.py(493 LOC)