Skip to content

feat(examples/voice_agents): add ejentum_cognitive_harness#5823

Open
ejentum wants to merge 2 commits into
livekit:mainfrom
ejentum:examples-voice-ejentum-harness
Open

feat(examples/voice_agents): add ejentum_cognitive_harness#5823
ejentum wants to merge 2 commits into
livekit:mainfrom
ejentum:examples-voice-ejentum-harness

Conversation

@ejentum
Copy link
Copy Markdown

@ejentum ejentum commented May 23, 2026

Summary

Adds a new voice agent example under examples/voice_agents/ejentum_cognitive_harness.py that exposes the Ejentum cognitive harness REST API as a @function_tool the voice agent can call mid-conversation when the user asks something that benefits from structured reasoning (planning a migration, weighing trade-offs, debugging a confusing situation, resisting a leading question).

The agent sees one tool: fetch_cognitive_scaffold(task, mode). It picks the right mode for the user's request (reasoning, code, anti-deception, memory), gets back a structured scaffold, and the LLM threads that scaffold into its spoken response.

File

  • examples/voice_agents/ejentum_cognitive_harness.py (new, ~115 lines)

Follows the conventions of the existing voice agent examples (annotated_tool_args.py, etc.): single-file Python module, Agent subclass with @function_tool methods, entrypoint(JobContext) that builds an AgentSession and calls start + generate_reply, cli.run_app(AgentServer(entrypoint)) at the bottom. No new top-level dependencies (aiohttp is already commonly available, but happy to swap to httpx or requests if the reviewer prefers).

Why a voice example specifically

The harness's value is highest when the model is about to commit to a response with limited time to think. Voice tightens that constraint further: there's no "think out loud, then revise" pass. A short scaffold fetched between user turn and model reply is exactly the shape that helps in a real-time loop. The example uses the same livekit.inference stack (assemblyai/universal-streaming, openai/gpt-4o-mini, cartesia/sonic-2, silero.VAD) the other examples use, so the integration surface is just the @function_tool.

Affiliation

I maintain the Ejentum harness API. Submitting this as a voice agent example because the function_tool + REST-call pattern is generally useful for any in-loop third-party tool, and Ejentum is a clean worked example because the REST gateway is a single endpoint with a mode arg. The docstring on fetch_cognitive_scaffold is written so the LLM can pick the right mode autonomously. Ejentum has free and paid tiers; the module docstring links to the dashboard for keys, not to a checkout.

Test plan

  • Mirrors the single-file Python shape of existing examples/voice_agents/*.py entries.
  • @function_tool docstring uses the documented Args: parser so the LLM gets typed argument descriptions.
  • Graceful fallback on missing key or API error (returns empty scaffold + logs warning; voice agent stays alive).
  • Voice scrubbed (no em dashes, no marketing language).
  • Local python ejentum_cognitive_harness.py dev with LiveKit credentials + Ejentum + OpenAI + Cartesia + AssemblyAI keys (cannot run a full voice agent in this environment; happy to follow up if the reviewer spots an API-surface mismatch).

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment on lines +112 to +129
async def entrypoint(ctx: JobContext) -> None:
session = AgentSession(
stt=inference.STT(model="assemblyai/universal-streaming:en"),
llm=inference.LLM(model="openai/gpt-4o-mini"),
tts=inference.TTS(model="cartesia/sonic-2:794f9389-aac1-45b6-b726-9d9369183238"),
vad=silero.VAD.load(),
)

await session.start(agent=CognitiveHarnessAgent(), room=ctx.room)
await session.generate_reply(
instructions=(
"Greet the user briefly and ask what they would like to think through."
),
)


if __name__ == "__main__":
cli.run_app(AgentServer(entrypoint))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ”΄ AgentServer does not accept positional arguments β€” example crashes at startup

AgentServer(entrypoint) on line 129 passes entrypoint as a positional argument, but AgentServer.__init__ uses a bare * after self (worker.py), making every parameter keyword-only. This raises TypeError: AgentServer.__init__() takes 1 positional argument but 2 were given immediately on startup. Additionally, even if the constructor accepted it, the entrypoint is never registered as an RTC session handler via @server.rtc_session(), so no sessions would ever be dispatched. Every other example in the repository follows the pattern of creating server = AgentServer(), decorating the entrypoint with @server.rtc_session(), and passing server to cli.run_app(). CI type-checking (scripts/check_types.py) only covers livekit.agents and livekit.plugins.* packages β€” not examples/ β€” so this is not caught.

Suggested change
async def entrypoint(ctx: JobContext) -> None:
session = AgentSession(
stt=inference.STT(model="assemblyai/universal-streaming:en"),
llm=inference.LLM(model="openai/gpt-4o-mini"),
tts=inference.TTS(model="cartesia/sonic-2:794f9389-aac1-45b6-b726-9d9369183238"),
vad=silero.VAD.load(),
)
await session.start(agent=CognitiveHarnessAgent(), room=ctx.room)
await session.generate_reply(
instructions=(
"Greet the user briefly and ask what they would like to think through."
),
)
if __name__ == "__main__":
cli.run_app(AgentServer(entrypoint))
server = AgentServer()
@server.rtc_session()
async def entrypoint(ctx: JobContext) -> None:
session = AgentSession(
stt=inference.STT(model="assemblyai/universal-streaming:en"),
llm=inference.LLM(model="openai/gpt-4o-mini"),
tts=inference.TTS(model="cartesia/sonic-2:794f9389-aac1-45b6-b726-9d9369183238"),
vad=silero.VAD.load(),
)
await session.start(agent=CognitiveHarnessAgent(), room=ctx.room)
await session.generate_reply(
instructions=(
"Greet the user briefly and ask what they would like to think through."
),
)
if __name__ == "__main__":
cli.run_app(server)
Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants