Skip to content

feat(broker): per-worker harness attribution for relaycast telemetry#1078

Closed
willwashburn wants to merge 2 commits into
mainfrom
fix/per-worker-harness
Closed

feat(broker): per-worker harness attribution for relaycast telemetry#1078
willwashburn wants to merge 2 commits into
mainfrom
fix/per-worker-harness

Conversation

@willwashburn

Copy link
Copy Markdown
Member

Problem

relaycast_server_* telemetry attributes every spawned agent to the session orchestrator harness — the human's claude-code/codex that ran agent-relay up. That's because AGENT_RELAY_ORCHESTRATOR_HARNESS is set once by the outermost CLI (bootstrap.ts) and inherited by the broker and all workers.

So a broker orchestrating a heterogeneous swarm (a codex worker spawned under a claude-code orchestrator, a gemini worker, …) collapses every worker's relaycast telemetry to the single orchestrator value. We want per-worker attribution: each agent's traffic tagged with the harness it actually runs.

Approach

The broker knows each worker's CLI at spawn time (spec.cli / params.cli), so derive the worker's harness there and stamp AGENT_RELAY_HARNESS — the highest-priority key the JS SDK reads (AGENT_RELAY_HARNESSAGENT_RELAY_ORCHESTRATOR_HARNESS → …), so it overrides the inherited orchestrator value.

Workers talk to relaycast via their own @relaycast/sdk (the MCP server constructs RelayCast directly against the gateway), and that SDK resolves harness from env — so setting the worker's env is the correct lever.

Covers all three worker entry modes:

Mode Path Change
wrap spawn_env_vars (via run_wrap) pass infer_harness_from_command(params.cli) instead of the broker's orchestrator harness
pty / app-server WorkerRegistry::spawn stamp AGENT_RELAY_HARNESS from infer_harness_from_command(spec.cli) into the worker env (unless harness_config already set it)
  • Reuses telemetry::infer_harness_from_command (already used for orchestrator process-tree detection) — now pub(crate) — for cli→harness mapping.
  • An unrecognized CLI sets nothing, so the worker gracefully falls back to the inherited orchestrator value rather than a wrong attribution.
  • The broker's own relaycast traffic (session WS, agent registration) keeps the orchestrator harness — the correct scope for broker-level events.

Result

  • Per-worker traffic → the worker's harness (codex, claude-code, gemini, …).
  • Broker's own traffic → the session orchestrator.

Tests

  • cli→harness mapping locked for the bare names the broker passes (claudeclaude-code, codex, cursor, gemini, and unknown→None).
  • 709 broker lib tests pass; clippy + cargo fmt --check clean.

Context

Follow-up to #1069 (broker forwards harness) and the closed #1075 (which was redundant — bootstrap.ts already propagates the orchestrator harness). This PR is the deliberate move from orchestrator-only to per-worker attribution.

🤖 Generated with Claude Code

Previously every spawned agent reported the session *orchestrator* harness
(the human's claude-code/codex that ran `agent-relay up`), because the
broker forwarded a single inherited `AGENT_RELAY_ORCHESTRATOR_HARNESS` to
all workers. A broker orchestrating a heterogeneous swarm (a codex worker
under a claude-code orchestrator, etc.) collapsed every worker's relaycast
telemetry to one value.

The broker knows each worker's CLI at spawn time, so attribute each agent's
relaycast telemetry to the harness it actually runs:

- Expose `telemetry::infer_harness_from_command` (already used for
  orchestrator process-tree detection) for cli→harness mapping.
- wrap path (`spawn_env_vars` via `run_wrap`): pass
  `infer_harness_from_command(params.cli)` as the agent's harness instead of
  the broker's orchestrator harness.
- pty / app-server paths (`WorkerRegistry::spawn`): stamp
  `AGENT_RELAY_HARNESS` from `infer_harness_from_command(spec.cli)` into the
  worker command env, unless harness_config already set it.

`AGENT_RELAY_HARNESS` is the highest-priority key the JS SDK reads, so it
overrides the inherited orchestrator value; an unrecognized CLI sets nothing
and the worker falls back to the inherited orchestrator. The broker's *own*
relaycast traffic (session WS, registration) keeps the orchestrator harness
— that's the correct scope for broker-level events.

Net: `relaycast_server_*` events are attributed to the worker's harness for
per-worker traffic, and to the orchestrator for the broker's own traffic.

Tests: cli→harness mapping locked for the bare names the broker passes
(claude/codex/cursor/gemini + unknown→None); 709 broker lib tests pass;
clippy + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@willwashburn willwashburn requested a review from khaliqgant as a code owner June 10, 2026 11:08
@codeant-ai

codeant-ai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@willwashburn, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 7 minutes and 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: da589a06-0c88-460d-972a-e0e0571bacf7

📥 Commits

Reviewing files that changed from the base of the PR and between 7917e42 and 639424f.

📒 Files selected for processing (2)
  • crates/broker/src/telemetry.rs
  • crates/broker/src/wrap.rs
📝 Walkthrough

Walkthrough

The PR updates telemetry harness attribution for spawned workers by establishing a canonical CLI-to-harness mapping function and integrating it into worker and wrap spawn paths, enabling per-agent telemetry instead of session-wide orchestrator-based attribution.

Changes

Per-worker telemetry harness attribution

Layer / File(s) Summary
Harness inference function contract and tests
crates/broker/src/telemetry.rs
infer_harness_from_command is made pub(crate) and documented as the canonical harness-ID mapping for orchestrator detection and worker attribution. Unit tests verify bare CLI name recognition (claude, codex, cursor, gemini) and None fallback for unrecognized CLIs.
Worker spawn per-agent harness injection
crates/broker/src/worker.rs
Worker registry spawn now conditionally injects AGENT_RELAY_HARNESS into process environment per spawned agent, deriving it from spec.cli via the inference function, while preserving explicitly provided values.
Wrap spawn telemetry harness inference
crates/broker/src/wrap.rs
Spawn broker action handler updates telemetry harness attribution to infer from params.cli via the inference function instead of using generic orchestrator-based fallback.
Spawner harness forwarding documentation
crates/broker/src/spawner.rs
Inline comment clarified to explain that spawned agents' own harness (resolved by JS SDK via env) attributes relaycast telemetry per-harness rather than inheriting session orchestrator.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • AgentWorkforce/relay#1069: Earlier harness-forwarding work establishing the AGENT_RELAY_HARNESS pattern and relaycast client integration that this PR extends with per-worker harness inference.

Suggested reviewers

  • khaliqgant

Poem

🐰 A harness per worker hops into view,
No more orchestrator claims (that weren't true!),
Claude, Codex, Cursor—each gets their own,
Telemetry telemetry, properly known! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding per-worker harness attribution for relaycast telemetry in the broker.
Description check ✅ Passed The description includes Problem, Approach, Result, Tests, and Context sections following the template's intent and providing comprehensive context.
Linked Issues check ✅ Passed The PR implementation addresses issue #1075 by enabling per-worker harness attribution for relaycast telemetry. Changes to worker.rs, wrap.rs, and telemetry.rs fulfill the objective of deriving worker harness from CLI at spawn time.
Out of Scope Changes check ✅ Passed All changes are scoped to per-worker harness attribution: comment updates in spawner.rs, function visibility and tests in telemetry.rs, worker environment setup in worker.rs, and harness inference in wrap.rs. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/per-worker-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements per-worker harness attribution by tagging an agent's telemetry with the specific CLI it runs (such as claude, codex, or gemini) rather than the inherited session orchestrator. This is done by exposing and utilizing infer_harness_from_command to set the AGENT_RELAY_HARNESS environment variable. The review feedback correctly identifies a critical gap where relying solely on spec.cli is insufficient because it can be None in several common spawn scenarios (such as PTY harness configurations or headless runtimes), and provides a robust code suggestion to resolve the actual CLI command before inference.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +705 to +713
if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") {
if let Some(harness) = spec
.cli
.as_deref()
.and_then(crate::telemetry::infer_harness_from_command)
{
command.env("AGENT_RELAY_HARNESS", harness);
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using spec.cli directly to infer the harness is insufficient because it will be None in several common spawn scenarios:

  1. When spec.harness_config is Some(ResolvedHarnessConfig::Pty(config)), the CLI command is specified in config.command rather than spec.cli.
  2. When spec.runtime is AgentRuntime::Headless and spec.harness_config is None, the CLI command is determined dynamically from the provider name via headless_provider_cli_name.

To ensure robust per-worker harness attribution across all spawn modes, we should resolve the actual CLI command being run in each case before passing it to infer_harness_from_command.

        if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") {
            let cli_to_infer = match &spec.harness_config {
                Some(ResolvedHarnessConfig::Pty(config)) => Some(config.command.as_str()),
                Some(ResolvedHarnessConfig::Headless(_)) => None,
                None => match spec.runtime {
                    AgentRuntime::Pty => spec.cli.as_deref(),
                    AgentRuntime::Headless => spec.provider.as_deref().map(headless_provider_cli_name),
                },
            };
            if let Some(harness) = cli_to_infer.and_then(crate::telemetry::infer_harness_from_command) {
                command.env("AGENT_RELAY_HARNESS", harness);
            }
        }

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7917e42f5d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/// Map a CLI command (e.g. `claude`, `codex`, `gemini`) to its canonical
/// harness id. Used both for orchestrator detection (walking the process tree)
/// and for per-worker attribution (the broker knows the CLI it spawns).
pub(crate) fn infer_harness_from_command(command: &str) -> Option<&'static str> {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse CLI strings before inferring worker harness

When the spawn CLI includes inline arguments, which parse_cli_command explicitly supports (for example codex --profile foo or claude --model haiku), this helper now receives the entire command string and treats it as the executable basename. That makes base become codex --profile foo/claude --model haiku, so inference returns None and the child keeps the inherited orchestrator harness, reintroducing the telemetry misattribution this change is meant to fix for those valid spawn inputs. Infer from the parsed executable (or make the helper shlex-parse first) before setting AGENT_RELAY_HARNESS.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/broker/src/worker.rs`:
- Around line 700-713: The harness inference currently only checks spec.cli;
change it to use the resolved runtime harness/command used to launch the worker
(e.g., the ResolvedHarnessConfig/ResolvedHarness instance or the variable
holding resolved harness info) so AGENT_RELAY_HARNESS is set even for
ResolvedHarnessConfig::Pty(config.command) and provider-derived headless CLIs;
specifically, when AGENT_RELAY_HARNESS is not present in harness_env, call
crate::telemetry::infer_harness_from_command on the actual resolved command
(handle the Pty(config.command) branch and any headless/provider CLI path
available in the resolved harness) and then call
command.env("AGENT_RELAY_HARNESS", harness) if it returns Some(harness) instead
of only reading spec.cli.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a14e2584-b468-43db-a2e2-17ca49f088e6

📥 Commits

Reviewing files that changed from the base of the PR and between 040da37 and 7917e42.

📒 Files selected for processing (4)
  • crates/broker/src/spawner.rs
  • crates/broker/src/telemetry.rs
  • crates/broker/src/worker.rs
  • crates/broker/src/wrap.rs

Comment on lines +700 to +713
// Per-worker harness attribution: tag this agent's relaycast telemetry
// with the CLI it runs (codex / claude-code / …) rather than the session
// orchestrator the broker inherited. The agent's MCP/SDK reads
// AGENT_RELAY_HARNESS (highest-priority key). Don't override an explicit
// value supplied via harness_config env.
if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") {
if let Some(harness) = spec
.cli
.as_deref()
.and_then(crate::telemetry::infer_harness_from_command)
{
command.env("AGENT_RELAY_HARNESS", harness);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the resolved runtime CLI source for harness inference, not only spec.cli.

At Line 706-710, attribution is inferred only from spec.cli, but this function can launch workers from ResolvedHarnessConfig::Pty(config.command) or provider-derived headless CLI paths. In those paths, spec.cli can be unset or not match the actual command, so AGENT_RELAY_HARNESS is silently omitted and attribution falls back incorrectly.

Suggested fix
-        if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") {
-            if let Some(harness) = spec
-                .cli
-                .as_deref()
-                .and_then(crate::telemetry::infer_harness_from_command)
-            {
-                command.env("AGENT_RELAY_HARNESS", harness);
-            }
-        }
+        if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") {
+            let cli_for_harness = spec
+                .cli
+                .as_deref()
+                // fallback: infer from normalized runtime/provider fields set during spawn assembly
+                .or(spec.provider.as_deref());
+            if let Some(harness) =
+                cli_for_harness.and_then(crate::telemetry::infer_harness_from_command)
+            {
+                command.env("AGENT_RELAY_HARNESS", harness);
+            }
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Per-worker harness attribution: tag this agent's relaycast telemetry
// with the CLI it runs (codex / claude-code / …) rather than the session
// orchestrator the broker inherited. The agent's MCP/SDK reads
// AGENT_RELAY_HARNESS (highest-priority key). Don't override an explicit
// value supplied via harness_config env.
if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") {
if let Some(harness) = spec
.cli
.as_deref()
.and_then(crate::telemetry::infer_harness_from_command)
{
command.env("AGENT_RELAY_HARNESS", harness);
}
}
// Per-worker harness attribution: tag this agent's relaycast telemetry
// with the CLI it runs (codex / claude-code / …) rather than the session
// orchestrator the broker inherited. The agent's MCP/SDK reads
// AGENT_RELAY_HARNESS (highest-priority key). Don't override an explicit
// value supplied via harness_config env.
if !harness_env.iter().any(|(k, _)| k == "AGENT_RELAY_HARNESS") {
let cli_for_harness = spec
.cli
.as_deref()
// fallback: infer from normalized runtime/provider fields set during spawn assembly
.or(spec.provider.as_deref());
if let Some(harness) =
cli_for_harness.and_then(crate::telemetry::infer_harness_from_command)
{
command.env("AGENT_RELAY_HARNESS", harness);
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/broker/src/worker.rs` around lines 700 - 713, The harness inference
currently only checks spec.cli; change it to use the resolved runtime
harness/command used to launch the worker (e.g., the
ResolvedHarnessConfig/ResolvedHarness instance or the variable holding resolved
harness info) so AGENT_RELAY_HARNESS is set even for
ResolvedHarnessConfig::Pty(config.command) and provider-derived headless CLIs;
specifically, when AGENT_RELAY_HARNESS is not present in harness_env, call
crate::telemetry::infer_harness_from_command on the actual resolved command
(handle the Pty(config.command) branch and any headless/provider CLI path
available in the resolved harness) and then call
command.env("AGENT_RELAY_HARNESS", harness) if it returns Some(harness) instead
of only reading spec.cli.

willwashburn added a commit that referenced this pull request Jun 10, 2026
#1085)

PR4 (core) of the origin_actor rollout (cloud/plans/origin-actor.md). Aligns
the broker producer with the engine 3.0.0 contract (harness -> origin_actor).

- Bump the `relaycast` crate pin =2.4.0 -> =3.0.0 (the SDK renamed
  `with_harness` -> `with_origin_actor`; published 3.0.0 to crates.io).
- The broker's own relaycast traffic (workspace stream + agent registration)
  now sends `origin_actor = agent-relay-cli/cli` via the renamed
  `with_origin_actor`, at all three client sites (WS handshake + both
  `build_relay_client` helpers). New `telemetry::BROKER_ORIGIN_ACTOR` const.

This fixes attribution for the broker's own (@relaycast/sdk-rust) traffic —
the majority of `relaycast_server_*` events.

Fast-follows (separate PRs):
- per-worker spawned-agent path `agent-relay-cli/agent/<harness>` (extends the
  open #1078) + the relay JS SDK rename so spawned agents send the new header.
- model + version in the name segment (`@<version>-<model>`) — sourcing.

709 broker tests pass; clippy + fmt clean; builds against crate 3.0.0 with no
other reconciliation needed.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@willwashburn

Copy link
Copy Markdown
Member Author

Superseded by #1090, which folds the per-worker attribution into the full origin_actor model — the broker now sets AGENT_RELAY_ORIGIN_ACTOR=agent-relay-cli/agent/ (the full path), consumed by the JS SDK 3.1.1.

willwashburn added a commit that referenced this pull request Jun 11, 2026
)

* feat: emit origin_actor from spawned agents (JS SDK + per-worker)

Completes the origin_actor producer rollout for the JS spawned-agent path
(the ~23%). Consumes @relaycast/sdk 3.1.1 (relaycast#187), which renamed the
`harness` option -> `originActor` / `X-Relaycast-Origin-Actor`.

JS:
- relaycast-telemetry.ts (cli + sdk): resolve `originActor` from a new
  AGENT_RELAY_ORIGIN_ACTOR env (the broker sets the per-worker path), falling
  back to `agent-relay-cli/agent/<orchestrator-harness>` synthesized from the
  existing harness env. `relaycastTelemetryOptions` now returns `{ originActor }`.
- messaging/relaycast.ts: pass `originActor`.
- bump @relaycast/sdk ^2.x -> ^3.1.1 (cli + sdk) + lockfile.

Broker (supersedes the open #1078):
- spawn_env_vars: set AGENT_RELAY_ORIGIN_ACTOR=agent-relay-cli/agent/<harness>
  (was AGENT_RELAY_HARNESS), with the per-worker harness from
  infer_harness_from_command(params.cli) (was the orchestrator harness).
- worker.rs (pty/app-server): stamp the same path from spec.cli.
- infer_harness_from_command -> pub(crate).

Result: each spawned JS agent reports `agent-relay-cli/agent/<its-harness>`
(codex / claude-code / …), per-agent — closing the 23% the core (broker-own,
agent-relay-cli/cli) didn't cover.

713 broker tests pass; clippy + fmt clean; sdk tsc clean; agent-relay (6) +
mcp.startup (20) JS tests pass. The cli typecheck vs the real 3.1.1 is left to
CI (verified 3.1.1 has originActor + the model spawn field).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* style: auto-format with Prettier

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant