feat(broker): per-worker harness attribution for relaycast telemetry#1078
feat(broker): per-worker harness attribution for relaycast telemetry#1078willwashburn wants to merge 2 commits into
Conversation
Previously every spawned agent reported the session *orchestrator* harness (the human's claude-code/codex that ran `agent-relay up`), because the broker forwarded a single inherited `AGENT_RELAY_ORCHESTRATOR_HARNESS` to all workers. A broker orchestrating a heterogeneous swarm (a codex worker under a claude-code orchestrator, etc.) collapsed every worker's relaycast telemetry to one value. The broker knows each worker's CLI at spawn time, so attribute each agent's relaycast telemetry to the harness it actually runs: - Expose `telemetry::infer_harness_from_command` (already used for orchestrator process-tree detection) for cli→harness mapping. - wrap path (`spawn_env_vars` via `run_wrap`): pass `infer_harness_from_command(params.cli)` as the agent's harness instead of the broker's orchestrator harness. - pty / app-server paths (`WorkerRegistry::spawn`): stamp `AGENT_RELAY_HARNESS` from `infer_harness_from_command(spec.cli)` into the worker command env, unless harness_config already set it. `AGENT_RELAY_HARNESS` is the highest-priority key the JS SDK reads, so it overrides the inherited orchestrator value; an unrecognized CLI sets nothing and the worker falls back to the inherited orchestrator. The broker's *own* relaycast traffic (session WS, registration) keeps the orchestrator harness — that's the correct scope for broker-level events. Net: `relaycast_server_*` events are attributed to the worker's harness for per-worker traffic, and to the orchestrator for the broker's own traffic. Tests: cli→harness mapping locked for the bare names the broker passes (claude/codex/cursor/gemini + unknown→None); 709 broker lib tests pass; clippy + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI. |
|
Warning Review limit reached
More reviews will be available in 7 minutes and 58 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR updates telemetry harness attribution for spawned workers by establishing a canonical CLI-to-harness mapping function and integrating it into worker and wrap spawn paths, enabling per-agent telemetry instead of session-wide orchestrator-based attribution. ChangesPer-worker telemetry harness attribution
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements per-worker harness attribution by tagging an agent's telemetry with the specific CLI it runs (such as claude, codex, or gemini) rather than the inherited session orchestrator. This is done by exposing and utilizing infer_harness_from_command to set the AGENT_RELAY_HARNESS environment variable. The review feedback correctly identifies a critical gap where relying solely on spec.cli is insufficient because it can be None in several common spawn scenarios (such as PTY harness configurations or headless runtimes), and provides a robust code suggestion to resolve the actual CLI command before inference.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") { | ||
| if let Some(harness) = spec | ||
| .cli | ||
| .as_deref() | ||
| .and_then(crate::telemetry::infer_harness_from_command) | ||
| { | ||
| command.env("AGENT_RELAY_HARNESS", harness); | ||
| } | ||
| } |
There was a problem hiding this comment.
Using spec.cli directly to infer the harness is insufficient because it will be None in several common spawn scenarios:
- When
spec.harness_configisSome(ResolvedHarnessConfig::Pty(config)), the CLI command is specified inconfig.commandrather thanspec.cli. - When
spec.runtimeisAgentRuntime::Headlessandspec.harness_configisNone, the CLI command is determined dynamically from the provider name viaheadless_provider_cli_name.
To ensure robust per-worker harness attribution across all spawn modes, we should resolve the actual CLI command being run in each case before passing it to infer_harness_from_command.
if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") {
let cli_to_infer = match &spec.harness_config {
Some(ResolvedHarnessConfig::Pty(config)) => Some(config.command.as_str()),
Some(ResolvedHarnessConfig::Headless(_)) => None,
None => match spec.runtime {
AgentRuntime::Pty => spec.cli.as_deref(),
AgentRuntime::Headless => spec.provider.as_deref().map(headless_provider_cli_name),
},
};
if let Some(harness) = cli_to_infer.and_then(crate::telemetry::infer_harness_from_command) {
command.env("AGENT_RELAY_HARNESS", harness);
}
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7917e42f5d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /// Map a CLI command (e.g. `claude`, `codex`, `gemini`) to its canonical | ||
| /// harness id. Used both for orchestrator detection (walking the process tree) | ||
| /// and for per-worker attribution (the broker knows the CLI it spawns). | ||
| pub(crate) fn infer_harness_from_command(command: &str) -> Option<&'static str> { |
There was a problem hiding this comment.
Parse CLI strings before inferring worker harness
When the spawn CLI includes inline arguments, which parse_cli_command explicitly supports (for example codex --profile foo or claude --model haiku), this helper now receives the entire command string and treats it as the executable basename. That makes base become codex --profile foo/claude --model haiku, so inference returns None and the child keeps the inherited orchestrator harness, reintroducing the telemetry misattribution this change is meant to fix for those valid spawn inputs. Infer from the parsed executable (or make the helper shlex-parse first) before setting AGENT_RELAY_HARNESS.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/broker/src/worker.rs`:
- Around line 700-713: The harness inference currently only checks spec.cli;
change it to use the resolved runtime harness/command used to launch the worker
(e.g., the ResolvedHarnessConfig/ResolvedHarness instance or the variable
holding resolved harness info) so AGENT_RELAY_HARNESS is set even for
ResolvedHarnessConfig::Pty(config.command) and provider-derived headless CLIs;
specifically, when AGENT_RELAY_HARNESS is not present in harness_env, call
crate::telemetry::infer_harness_from_command on the actual resolved command
(handle the Pty(config.command) branch and any headless/provider CLI path
available in the resolved harness) and then call
command.env("AGENT_RELAY_HARNESS", harness) if it returns Some(harness) instead
of only reading spec.cli.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: a14e2584-b468-43db-a2e2-17ca49f088e6
📒 Files selected for processing (4)
crates/broker/src/spawner.rscrates/broker/src/telemetry.rscrates/broker/src/worker.rscrates/broker/src/wrap.rs
| // Per-worker harness attribution: tag this agent's relaycast telemetry | ||
| // with the CLI it runs (codex / claude-code / …) rather than the session | ||
| // orchestrator the broker inherited. The agent's MCP/SDK reads | ||
| // AGENT_RELAY_HARNESS (highest-priority key). Don't override an explicit | ||
| // value supplied via harness_config env. | ||
| if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") { | ||
| if let Some(harness) = spec | ||
| .cli | ||
| .as_deref() | ||
| .and_then(crate::telemetry::infer_harness_from_command) | ||
| { | ||
| command.env("AGENT_RELAY_HARNESS", harness); | ||
| } | ||
| } |
There was a problem hiding this comment.
Use the resolved runtime CLI source for harness inference, not only spec.cli.
At Line 706-710, attribution is inferred only from spec.cli, but this function can launch workers from ResolvedHarnessConfig::Pty(config.command) or provider-derived headless CLI paths. In those paths, spec.cli can be unset or not match the actual command, so AGENT_RELAY_HARNESS is silently omitted and attribution falls back incorrectly.
Suggested fix
- if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") {
- if let Some(harness) = spec
- .cli
- .as_deref()
- .and_then(crate::telemetry::infer_harness_from_command)
- {
- command.env("AGENT_RELAY_HARNESS", harness);
- }
- }
+ if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") {
+ let cli_for_harness = spec
+ .cli
+ .as_deref()
+ // fallback: infer from normalized runtime/provider fields set during spawn assembly
+ .or(spec.provider.as_deref());
+ if let Some(harness) =
+ cli_for_harness.and_then(crate::telemetry::infer_harness_from_command)
+ {
+ command.env("AGENT_RELAY_HARNESS", harness);
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Per-worker harness attribution: tag this agent's relaycast telemetry | |
| // with the CLI it runs (codex / claude-code / …) rather than the session | |
| // orchestrator the broker inherited. The agent's MCP/SDK reads | |
| // AGENT_RELAY_HARNESS (highest-priority key). Don't override an explicit | |
| // value supplied via harness_config env. | |
| if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") { | |
| if let Some(harness) = spec | |
| .cli | |
| .as_deref() | |
| .and_then(crate::telemetry::infer_harness_from_command) | |
| { | |
| command.env("AGENT_RELAY_HARNESS", harness); | |
| } | |
| } | |
| // Per-worker harness attribution: tag this agent's relaycast telemetry | |
| // with the CLI it runs (codex / claude-code / …) rather than the session | |
| // orchestrator the broker inherited. The agent's MCP/SDK reads | |
| // AGENT_RELAY_HARNESS (highest-priority key). Don't override an explicit | |
| // value supplied via harness_config env. | |
| if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") { | |
| let cli_for_harness = spec | |
| .cli | |
| .as_deref() | |
| // fallback: infer from normalized runtime/provider fields set during spawn assembly | |
| .or(spec.provider.as_deref()); | |
| if let Some(harness) = | |
| cli_for_harness.and_then(crate::telemetry::infer_harness_from_command) | |
| { | |
| command.env("AGENT_RELAY_HARNESS", harness); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/broker/src/worker.rs` around lines 700 - 713, The harness inference
currently only checks spec.cli; change it to use the resolved runtime
harness/command used to launch the worker (e.g., the
ResolvedHarnessConfig/ResolvedHarness instance or the variable holding resolved
harness info) so AGENT_RELAY_HARNESS is set even for
ResolvedHarnessConfig::Pty(config.command) and provider-derived headless CLIs;
specifically, when AGENT_RELAY_HARNESS is not present in harness_env, call
crate::telemetry::infer_harness_from_command on the actual resolved command
(handle the Pty(config.command) branch and any headless/provider CLI path
available in the resolved harness) and then call
command.env("AGENT_RELAY_HARNESS", harness) if it returns Some(harness) instead
of only reading spec.cli.
#1085) PR4 (core) of the origin_actor rollout (cloud/plans/origin-actor.md). Aligns the broker producer with the engine 3.0.0 contract (harness -> origin_actor). - Bump the `relaycast` crate pin =2.4.0 -> =3.0.0 (the SDK renamed `with_harness` -> `with_origin_actor`; published 3.0.0 to crates.io). - The broker's own relaycast traffic (workspace stream + agent registration) now sends `origin_actor = agent-relay-cli/cli` via the renamed `with_origin_actor`, at all three client sites (WS handshake + both `build_relay_client` helpers). New `telemetry::BROKER_ORIGIN_ACTOR` const. This fixes attribution for the broker's own (@relaycast/sdk-rust) traffic — the majority of `relaycast_server_*` events. Fast-follows (separate PRs): - per-worker spawned-agent path `agent-relay-cli/agent/<harness>` (extends the open #1078) + the relay JS SDK rename so spawned agents send the new header. - model + version in the name segment (`@<version>-<model>`) — sourcing. 709 broker tests pass; clippy + fmt clean; builds against crate 3.0.0 with no other reconciliation needed. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Superseded by #1090, which folds the per-worker attribution into the full origin_actor model — the broker now sets AGENT_RELAY_ORIGIN_ACTOR=agent-relay-cli/agent/ (the full path), consumed by the JS SDK 3.1.1. |
) * feat: emit origin_actor from spawned agents (JS SDK + per-worker) Completes the origin_actor producer rollout for the JS spawned-agent path (the ~23%). Consumes @relaycast/sdk 3.1.1 (relaycast#187), which renamed the `harness` option -> `originActor` / `X-Relaycast-Origin-Actor`. JS: - relaycast-telemetry.ts (cli + sdk): resolve `originActor` from a new AGENT_RELAY_ORIGIN_ACTOR env (the broker sets the per-worker path), falling back to `agent-relay-cli/agent/<orchestrator-harness>` synthesized from the existing harness env. `relaycastTelemetryOptions` now returns `{ originActor }`. - messaging/relaycast.ts: pass `originActor`. - bump @relaycast/sdk ^2.x -> ^3.1.1 (cli + sdk) + lockfile. Broker (supersedes the open #1078): - spawn_env_vars: set AGENT_RELAY_ORIGIN_ACTOR=agent-relay-cli/agent/<harness> (was AGENT_RELAY_HARNESS), with the per-worker harness from infer_harness_from_command(params.cli) (was the orchestrator harness). - worker.rs (pty/app-server): stamp the same path from spec.cli. - infer_harness_from_command -> pub(crate). Result: each spawned JS agent reports `agent-relay-cli/agent/<its-harness>` (codex / claude-code / …), per-agent — closing the 23% the core (broker-own, agent-relay-cli/cli) didn't cover. 713 broker tests pass; clippy + fmt clean; sdk tsc clean; agent-relay (6) + mcp.startup (20) JS tests pass. The cli typecheck vs the real 3.1.1 is left to CI (verified 3.1.1 has originActor + the model spawn field). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * style: auto-format with Prettier --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Problem
relaycast_server_*telemetry attributes every spawned agent to the session orchestrator harness — the human'sclaude-code/codexthat ranagent-relay up. That's becauseAGENT_RELAY_ORCHESTRATOR_HARNESSis set once by the outermost CLI (bootstrap.ts) and inherited by the broker and all workers.So a broker orchestrating a heterogeneous swarm (a
codexworker spawned under aclaude-codeorchestrator, ageminiworker, …) collapses every worker's relaycast telemetry to the single orchestrator value. We want per-worker attribution: each agent's traffic tagged with the harness it actually runs.Approach
The broker knows each worker's CLI at spawn time (
spec.cli/params.cli), so derive the worker's harness there and stampAGENT_RELAY_HARNESS— the highest-priority key the JS SDK reads (AGENT_RELAY_HARNESS→AGENT_RELAY_ORCHESTRATOR_HARNESS→ …), so it overrides the inherited orchestrator value.Workers talk to relaycast via their own
@relaycast/sdk(the MCP server constructsRelayCastdirectly against the gateway), and that SDK resolves harness from env — so setting the worker's env is the correct lever.Covers all three worker entry modes:
wrapspawn_env_vars(viarun_wrap)infer_harness_from_command(params.cli)instead of the broker's orchestrator harnesspty/app-serverWorkerRegistry::spawnAGENT_RELAY_HARNESSfrominfer_harness_from_command(spec.cli)into the worker env (unlessharness_configalready set it)telemetry::infer_harness_from_command(already used for orchestrator process-tree detection) — nowpub(crate)— for cli→harness mapping.Result
codex,claude-code,gemini, …).Tests
claude→claude-code,codex,cursor,gemini, and unknown→None).clippy+cargo fmt --checkclean.Context
Follow-up to #1069 (broker forwards harness) and the closed #1075 (which was redundant —
bootstrap.tsalready propagates the orchestrator harness). This PR is the deliberate move from orchestrator-only to per-worker attribution.🤖 Generated with Claude Code