Source: reliability spec cloud/docs/specs/agent-relay-reliability-spec.md §P1 (lifecycle). Filed from spec-triage on PR #861.
Problem
codex workers exhausted their context window and exited mid-task repeatedly. The only signal was a Context 6% left TTY string and later disappearance from agent-relay who. The orchestrator had to infer "agent dying" from scraped TTY and manually take over. Work was nearly lost.
Required
- Emit structured, subscribable lifecycle events:
agent_context_low { pct }, agent_exited { reason }, agent_permanently_dead, agent_idle { since } — observably actionable (the skill docs reference these but they were not actionable in practice).
agent-relay who / who --json should additionally include last-activity, context-budget (if known), and current state (working / idle / blocked-on-send).
Already partially shipped (PR #861, quick win)
who --json now emits real { name, cli, status, pid, uptimeSecs, memoryBytes } (replacing fabricated ONLINE/lastSeen:now). Still missing: last-activity, context-budget, working/idle/blocked state, and the lifecycle events themselves (broker-level).
Acceptance
Killing/exhausting a worker emits agent_exited/agent_context_low the orchestrator receives without scraping logs.
Scope
Broker event emission (Rust) + protocol + SDK subscription surface + CLI. Own effort; depends on broker work.
Source: reliability spec
cloud/docs/specs/agent-relay-reliability-spec.md§P1 (lifecycle). Filed from spec-triage on PR #861.Problem
codex workers exhausted their context window and exited mid-task repeatedly. The only signal was a
Context 6% leftTTY string and later disappearance fromagent-relay who. The orchestrator had to infer "agent dying" from scraped TTY and manually take over. Work was nearly lost.Required
agent_context_low { pct },agent_exited { reason },agent_permanently_dead,agent_idle { since }— observably actionable (the skill docs reference these but they were not actionable in practice).agent-relay who/who --jsonshould additionally include last-activity, context-budget (if known), and current state (working / idle / blocked-on-send).Already partially shipped (PR #861, quick win)
who --jsonnow emits real{ name, cli, status, pid, uptimeSecs, memoryBytes }(replacing fabricatedONLINE/lastSeen:now). Still missing: last-activity, context-budget, working/idle/blocked state, and the lifecycle events themselves (broker-level).Acceptance
Killing/exhausting a worker emits
agent_exited/agent_context_lowthe orchestrator receives without scraping logs.Scope
Broker event emission (Rust) + protocol + SDK subscription surface + CLI. Own effort; depends on broker work.