feat(examples/voice_agents): add ejentum_cognitive_harness#5823
feat(examples/voice_agents): add ejentum_cognitive_harness#5823ejentum wants to merge 2 commits into
Conversation
|
|
| async def entrypoint(ctx: JobContext) -> None: | ||
| session = AgentSession( | ||
| stt=inference.STT(model="assemblyai/universal-streaming:en"), | ||
| llm=inference.LLM(model="openai/gpt-4o-mini"), | ||
| tts=inference.TTS(model="cartesia/sonic-2:794f9389-aac1-45b6-b726-9d9369183238"), | ||
| vad=silero.VAD.load(), | ||
| ) | ||
|
|
||
| await session.start(agent=CognitiveHarnessAgent(), room=ctx.room) | ||
| await session.generate_reply( | ||
| instructions=( | ||
| "Greet the user briefly and ask what they would like to think through." | ||
| ), | ||
| ) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| cli.run_app(AgentServer(entrypoint)) |
There was a problem hiding this comment.
π΄ AgentServer does not accept positional arguments β example crashes at startup
AgentServer(entrypoint) on line 129 passes entrypoint as a positional argument, but AgentServer.__init__ uses a bare * after self (worker.py), making every parameter keyword-only. This raises TypeError: AgentServer.__init__() takes 1 positional argument but 2 were given immediately on startup. Additionally, even if the constructor accepted it, the entrypoint is never registered as an RTC session handler via @server.rtc_session(), so no sessions would ever be dispatched. Every other example in the repository follows the pattern of creating server = AgentServer(), decorating the entrypoint with @server.rtc_session(), and passing server to cli.run_app(). CI type-checking (scripts/check_types.py) only covers livekit.agents and livekit.plugins.* packages β not examples/ β so this is not caught.
| async def entrypoint(ctx: JobContext) -> None: | |
| session = AgentSession( | |
| stt=inference.STT(model="assemblyai/universal-streaming:en"), | |
| llm=inference.LLM(model="openai/gpt-4o-mini"), | |
| tts=inference.TTS(model="cartesia/sonic-2:794f9389-aac1-45b6-b726-9d9369183238"), | |
| vad=silero.VAD.load(), | |
| ) | |
| await session.start(agent=CognitiveHarnessAgent(), room=ctx.room) | |
| await session.generate_reply( | |
| instructions=( | |
| "Greet the user briefly and ask what they would like to think through." | |
| ), | |
| ) | |
| if __name__ == "__main__": | |
| cli.run_app(AgentServer(entrypoint)) | |
| server = AgentServer() | |
| @server.rtc_session() | |
| async def entrypoint(ctx: JobContext) -> None: | |
| session = AgentSession( | |
| stt=inference.STT(model="assemblyai/universal-streaming:en"), | |
| llm=inference.LLM(model="openai/gpt-4o-mini"), | |
| tts=inference.TTS(model="cartesia/sonic-2:794f9389-aac1-45b6-b726-9d9369183238"), | |
| vad=silero.VAD.load(), | |
| ) | |
| await session.start(agent=CognitiveHarnessAgent(), room=ctx.room) | |
| await session.generate_reply( | |
| instructions=( | |
| "Greet the user briefly and ask what they would like to think through." | |
| ), | |
| ) | |
| if __name__ == "__main__": | |
| cli.run_app(server) |
Was this helpful? React with π or π to provide feedback.
Summary
Adds a new voice agent example under
examples/voice_agents/ejentum_cognitive_harness.pythat exposes the Ejentum cognitive harness REST API as a@function_toolthe voice agent can call mid-conversation when the user asks something that benefits from structured reasoning (planning a migration, weighing trade-offs, debugging a confusing situation, resisting a leading question).The agent sees one tool:
fetch_cognitive_scaffold(task, mode). It picks the right mode for the user's request (reasoning,code,anti-deception,memory), gets back a structured scaffold, and the LLM threads that scaffold into its spoken response.File
examples/voice_agents/ejentum_cognitive_harness.py(new, ~115 lines)Follows the conventions of the existing voice agent examples (
annotated_tool_args.py, etc.): single-file Python module,Agentsubclass with@function_toolmethods,entrypoint(JobContext)that builds anAgentSessionand callsstart+generate_reply,cli.run_app(AgentServer(entrypoint))at the bottom. No new top-level dependencies (aiohttpis already commonly available, but happy to swap tohttpxorrequestsif the reviewer prefers).Why a voice example specifically
The harness's value is highest when the model is about to commit to a response with limited time to think. Voice tightens that constraint further: there's no "think out loud, then revise" pass. A short scaffold fetched between user turn and model reply is exactly the shape that helps in a real-time loop. The example uses the same
livekit.inferencestack (assemblyai/universal-streaming,openai/gpt-4o-mini,cartesia/sonic-2,silero.VAD) the other examples use, so the integration surface is just the@function_tool.Affiliation
I maintain the Ejentum harness API. Submitting this as a voice agent example because the function_tool + REST-call pattern is generally useful for any in-loop third-party tool, and Ejentum is a clean worked example because the REST gateway is a single endpoint with a
modearg. The docstring onfetch_cognitive_scaffoldis written so the LLM can pick the right mode autonomously. Ejentum has free and paid tiers; the module docstring links to the dashboard for keys, not to a checkout.Test plan
examples/voice_agents/*.pyentries.@function_tooldocstring uses the documentedArgs:parser so the LLM gets typed argument descriptions.python ejentum_cognitive_harness.py devwith LiveKit credentials + Ejentum + OpenAI + Cartesia + AssemblyAI keys (cannot run a full voice agent in this environment; happy to follow up if the reviewer spots an API-surface mismatch).