feat(spawn): add task-exit mode by khaliqgant · Pull Request #1129 · AgentWorkforce/relay

khaliqgant · 2026-06-14T12:13:35Z

Summary

Add optional task-exit spawn mode to broker HTTP spawn handling and harness-driver spawn inputs.
Add agent-relay local agent spawn/new --spawn-mode task-exit and --exit-after-task flags.
Extend MCP add_agent with task-exit options by appending the post-task /exit instruction.
Document the user-facing change in CHANGELOG.md.

Verification

cargo test -p agent-relay-broker exit_after_task_instruction_appends_clean_exit_contract
cargo test -p agent-relay-broker spawn_route_forwards_extended_fields
npx vitest run packages/harness-driver/src/agent-result.test.ts packages/cli/src/cli/commands/local-agent.test.ts packages/cli/src/cli/agent-relay-mcp.startup.test.ts
npm run build:harness-driver
npm --prefix packages/cli run build

No merge performed.

coderabbitai · 2026-06-14T12:13:43Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds an optional task-exit / single-shot spawn mode across the full stack: a SpawnMode type and new fields are added to harness-driver input interfaces, the broker HTTP handler parses and validates spawn_mode, a new apply_exit_after_task_instruction helper appends a /exit directive to the injected task, the harness-driver client and MCP add_agent tool forward the fields, CLI spawn/new subcommands expose matching flags, and integration test scenarios verify task completion and agent exit behavior.

Changes

Task-exit mode for agent-relay spawn APIs

Layer / File(s)	Summary
SpawnMode type and spawn input contracts `packages/harness-driver/src/types.ts`, `packages/cli/src/cli/lib/client-factory.ts`, `packages/harness-driver/src/lifecycle-hooks.ts`	Introduces exported `SpawnMode` union type and adds optional `spawnMode`/`exitAfterTask` fields to `SpawnPtyInput`, `SpawnHeadlessInput`, `SpawnCliInput`, `ClientSpawnOptions`, and `SpawnPatch`.
Broker listen API: spawn_mode parsing and exit_after_task forwarding `crates/broker/src/listen_api.rs`	Adds `exit_after_task: bool` to `ListenApiRequest::Spawn`, parses and validates `spawn_mode`/`spawnMode` from HTTP body (returning 400 for unknown values), ORs the derived flag with the explicit field, forwards it to the broker, and updates spawn route tests.
Broker runtime: apply_exit_after_task_instruction helper and task routing `crates/broker/src/runtime/session.rs`, `crates/broker/src/runtime/api.rs`, `crates/broker/src/runtime/tests.rs`, `crates/broker/src/runtime/fleet.rs`	Adds `EXIT_AFTER_TASK_INSTRUCTION` constant and `apply_exit_after_task_instruction` function; routes `effective_task` through that helper instead of `normalize_initial_task` when `exit_after_task` is set; adds unit test; fleet.rs includes explicit `exit_after_task: false` in spawn requests.
Harness-driver client serialization `packages/harness-driver/src/client.ts`, `packages/harness-driver/src/agent-result.test.ts`	`buildSpawnPtyBody` and `buildSpawnCliBody` conditionally include `spawnMode`/`exitAfterTask` in `/api/spawn` request bodies; `applySpawnPatch` merges those fields from lifecycle hooks; adds serialization test.
CLI commands and MCP add_agent surface `packages/cli/src/cli/agent-relay-mcp.ts`, `packages/cli/src/cli/commands/local-agent.ts`, `packages/cli/src/cli/agent-relay-mcp.startup.test.ts`, `packages/cli/src/cli/commands/local-agent.test.ts`	Adds post-task `/exit` instruction helper to MCP `add_agent` tool schema and wraps spawn task payload when `task_exit`/`single-shot` mode is selected; adds `--spawn-mode` and `--exit-after-task` flags with validation to `local agent spawn` and `new` subcommands; adds tests for both surfaces.
Integration test scenarios and eval infrastructure `tests/integration/broker/evals/scenarios/s08-task-exit.ts`, `tests/integration/broker/evals/runner.ts`, `tests/integration/broker/evals/scenarios/index.ts`, `tests/integration/broker/evals/types.ts`, `tests/integration/broker/utils/broker-harness.ts`, `tests/integration/broker/evals/run-task-exit-evals.sh`	Adds scenario s08-task-exit that spawns agent with exit instruction, waits for relay_inbound and agent_released events, validates exit confirmation; extends runner to support `--group=task-exit` and `TASK_EXIT_EVAL_SCENARIOS`; updates `ScenarioResult` type with exit metadata; adds broker-harness `exitAfterTask` option; includes Bash eval script with parallel batch execution across Claude/Codex/OpenCode models.
Trajectory records and CHANGELOG `.agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/`, `.agentworkforce/trajectories/completed/2026-06/traj_3gefkmrnuaui/`, `CHANGELOG.md`	New trajectory completion summary and JSON status/retrospective update for the security track review run; adds reviewed PR trajectory summary and metadata; CHANGELOG entry for the new task-exit mode.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as local-agent CLI / MCP add_agent
    participant HarnessDriver as HarnessDriverClient
    participant BrokerHTTP as listen_api_spawn (HTTP)
    participant BrokerRuntime as BrokerRuntime::handle_api_request
    participant Session as apply_exit_after_task_instruction

    CLI->>HarnessDriver: spawnPty({ spawnMode: 'task_exit', exitAfterTask: true, task })
    HarnessDriver->>BrokerHTTP: POST /api/spawn { spawnMode, exitAfterTask, task }
    BrokerHTTP->>BrokerHTTP: parse spawn_mode → derive exit_after_task bool (OR with explicit field)
    BrokerHTTP-->>CLI: 400 BAD_REQUEST (if unsupported spawnMode)
    BrokerHTTP->>BrokerRuntime: ListenApiRequest::Spawn { exit_after_task: true, task }
    BrokerRuntime->>Session: apply_exit_after_task_instruction(task)
    Session-->>BrokerRuntime: task + EXIT_AFTER_TASK_INSTRUCTION (/exit)
    BrokerRuntime->>BrokerRuntime: spawn agent session with effective_task

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Optional spawn mode: run the injected task, then exit #1116: Implements the feature of adding optional spawn modes (task-exit, task_exit, single_shot) that allow spawned CLI agents to cleanly self-terminate after completing their injected task through appended /exit instructions.

Poem

🐇 Hop, hop — the agent is done!
It types /exit and off it runs,
No lingering sessions, no idle wait,
Task-exit mode seals the gate.
The rabbit cheers: clean and bright —
One injected task, then goodnight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title clearly and concisely summarizes the main change: adding a task-exit mode to the spawn functionality.
Description check	✅ Passed	The description covers the key aspects of the change with a clear summary, verification steps, and build commands, though it omits the manual testing checkbox from the template.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ar-257-factory-single-shot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces an optional task-exit mode for spawned CLI agents, allowing them to cleanly self-terminate with /exit after completing their injected task. This is implemented across the Rust broker, CLI commands, MCP server, and harness driver. Feedback on the changes suggests improving the robustness of the TypeScript helper withExitAfterTaskInstruction to handle empty or whitespace-only tasks without prepending unnecessary newlines, aligning its behavior with the Rust implementation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-14T12:15:21Z

+function withExitAfterTaskInstruction(task: string): string {
+  return `${task}\n\n${EXIT_AFTER_TASK_INSTRUCTION}`;
+}


If the task parameter is empty or contains only whitespace, the current implementation will prepend unnecessary newlines (\n\n) before the instruction. To match the robustness of the Rust implementation (apply_exit_after_task_instruction), we should trim the task and only append the newlines if the task is non-empty.

Suggested change

function withExitAfterTaskInstruction(task: string): string {

return `${task}\n\n${EXIT_AFTER_TASK_INSTRUCTION}`;

}

function withExitAfterTaskInstruction(task: string): string {

const trimmed = task.trim();

return trimmed ? `${trimmed}\n\n${EXIT_AFTER_TASK_INSTRUCTION}` : EXIT_AFTER_TASK_INSTRUCTION;

}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 027ea9e0cd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T12:18:47Z

+  spawnMode?: SpawnMode;
+  exitAfterTask?: boolean;


Permit task-exit fields in driver action schema

These new public spawn fields are not mirrored in the registered agent.create action schema: packages/harness-driver/src/actions.ts still sets additionalProperties: false and only lists the older spawn properties. In environments that create agents through registerDriverActions instead of calling HarnessDriverClient.spawnPty directly, inputs such as spawnMode: 'task_exit' or exitAfterTask: true are rejected as additional properties before BrokerDriver.spawn can forward them, so the new task-exit mode is unavailable on that spawn API.

Useful? React with 👍 / 👎.

agent-relay-code · 2026-06-14T12:21:39Z

No blocking findings in the current checkout. I did not apply any edits.

Addressed comments

No bot or reviewer comments were available in .workforce/; only pr.diff, changed-files.txt, and context.json were present.

Verification

npx prettier --check ...changed files... passed.
npm --prefix packages/cli run lint passed with existing warnings only.
Focused tests passed: 44 passed.
npm run typecheck passed.
npm run build passed; Rust was skipped by the repo script because Cargo is not installed.
npm run test passed: 72 passed | 1 skipped, 887 passed | 5 skipped.

Could not run Rust CI locally because cargo is not installed in this environment. I also could not verify GitHub check/mergeable state from the provided artifacts, so I am not marking this READY.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

CHANGELOG.md (1)

12-12: ⚡ Quick win

Rewrite this bullet to be impact-first and less implementation-specific.

This reads a bit implementation-heavy for [Unreleased]; keep it as a short user-visible outcome.
Suggested wording
-- `agent-relay` spawn APIs add an optional task-exit mode so spawned CLI agents run the injected task and then cleanly self-terminate with `/exit`.
+- `agent-relay` spawn APIs now support an optional task-exit mode so spawned CLI agents can automatically exit after task completion.
As per coding guidelines, “Changelog entries should be concise and impact-first, with one short bullet per user-visible change… Omit … implementation details.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` at line 12, Rewrite the changelog entry on line 12 to focus on
the user-visible impact rather than implementation details. Remove technical
jargon like "spawn APIs," "task-exit mode," and "/exit" command, and instead
express the benefit in simple terms: that spawned CLI agents can now cleanly
complete their task and exit automatically. Keep the entry concise and
outcome-focused, omitting internal mechanics and API specifics.
Source: Coding guidelines

packages/cli/src/cli/commands/local-agent.test.ts (1)

70-98: ⚡ Quick win

Add parity tests for the new command and invalid --spawn-mode paths.

Line 70 adds good coverage for spawn, but the same lifecycle plumbing/validation was also added to local agent new. Please add tests for: (1) forwarding on new, and (2) rejection of unsupported spawn-mode values, to prevent drift between the two command paths.

Proposed test additions

+  it('new forwards task-exit lifecycle options', async () => {
+    const { program, client, attach } = harness();
+    await program.parseAsync(
+      [
+        'local',
+        'agent',
+        'new',
+        'codex',
+        '--name',
+        'WorkerB',
+        '--task',
+        'Ship it',
+        '--spawn-mode',
+        'task-exit',
+        '--exit-after-task',
+      ],
+      { from: 'user' }
+    );
+
+    expect(client.spawnPty).toHaveBeenCalledWith(
+      expect.objectContaining({
+        name: 'WorkerB',
+        cli: 'codex',
+        task: 'Ship it',
+        spawnMode: 'task_exit',
+        exitAfterTask: true,
+      })
+    );
+    expect(attach).toHaveBeenCalled();
+  });
+
+  it('spawn rejects unknown spawn mode', async () => {
+    const { program, client, error, exit } = harness();
+    await program.parseAsync(
+      ['local', 'agent', 'spawn', 'codex', '--spawn-mode', 'bogus'],
+      { from: 'user' }
+    );
+    expect(client.spawnPty).not.toHaveBeenCalled();
+    expect(error).toHaveBeenCalledWith(expect.stringContaining('Unknown spawn mode'));
+    expect(exit).toHaveBeenCalledWith(1);
+  });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/src/cli/commands/local-agent.test.ts` around lines 70 - 98, The
test at line 70-98 provides good coverage for the spawn command's task-exit
lifecycle options forwarding, but similar tests are missing for parity. Add
tests for the local agent new command that verify it forwards the same task-exit
lifecycle options (name, cli, task, spawnMode as task_exit, and exitAfterTask)
to client.spawnPty, similar to the spawn test structure. Additionally, add tests
that verify both the spawn and new command paths properly reject invalid
spawn-mode values that are not supported, ensuring validation consistency
between the two command implementations.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/summary.md:
- Line 41: The token mcp**agent-relay**_ in the summary line has malformed
emphasis markup with mismatched delimiters (starting with ** but ending with _),
which violates markdown linting rule MD037. Fix this by ensuring the emphasis
markup uses consistent delimiters throughout—either replace it with
mcp_agent-relay_ (using underscores on both sides) or mcp**agent-relay** (using
asterisks on both sides), or remove the emphasis entirely depending on the
intended formatting.

---

Nitpick comments:
In `@CHANGELOG.md`:
- Line 12: Rewrite the changelog entry on line 12 to focus on the user-visible
impact rather than implementation details. Remove technical jargon like "spawn
APIs," "task-exit mode," and "/exit" command, and instead express the benefit in
simple terms: that spawned CLI agents can now cleanly complete their task and
exit automatically. Keep the entry concise and outcome-focused, omitting
internal mechanics and API specifics.

In `@packages/cli/src/cli/commands/local-agent.test.ts`:
- Around line 70-98: The test at line 70-98 provides good coverage for the spawn
command's task-exit lifecycle options forwarding, but similar tests are missing
for parity. Add tests for the local agent new command that verify it forwards
the same task-exit lifecycle options (name, cli, task, spawnMode as task_exit,
and exitAfterTask) to client.spawnPty, similar to the spawn test structure.
Additionally, add tests that verify both the spawn and new command paths
properly reject invalid spawn-mode values that are not supported, ensuring
validation consistency between the two command implementations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2fa9f7ad-ea65-40a9-a3d1-d65f0ffc7149

📥 Commits

Reviewing files that changed from the base of the PR and between 87c05f1 and 6241d92.

📒 Files selected for processing (16)

.agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/summary.md
.agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/trajectory.json
CHANGELOG.md
crates/broker/src/listen_api.rs
crates/broker/src/runtime/api.rs
crates/broker/src/runtime/session.rs
crates/broker/src/runtime/tests.rs
packages/cli/src/cli/agent-relay-mcp.startup.test.ts
packages/cli/src/cli/agent-relay-mcp.ts
packages/cli/src/cli/commands/local-agent.test.ts
packages/cli/src/cli/commands/local-agent.ts
packages/cli/src/cli/lib/client-factory.ts
packages/harness-driver/src/agent-result.test.ts
packages/harness-driver/src/client.ts
packages/harness-driver/src/lifecycle-hooks.ts
packages/harness-driver/src/types.ts

agent-relay-code · 2026-06-14T12:36:10Z

Findings

packages/harness-driver/src/actions.ts:41: agent.create still uses additionalProperties: false but its schema does not include spawnMode or exitAfterTask. The PR adds those public spawn fields to driver types/client serialization, but callers using registerDriverActions will have task-exit inputs rejected before driver.spawn. This needs a semantic API/schema patch and coverage, so I left it as a review finding instead of editing.

I applied one mechanical fix:

.agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/summary.md:39-41: replaced malformed Markdown tool-token emphasis with inline code formatting.

Addressed comments

gemini-code-assist[bot]: raised whitespace-only task handling in withExitAfterTaskInstruction at packages/cli/src/cli/agent-relay-mcp.ts:48. Not changed because trimming task input changes runtime behavior and needs author judgment.
gemini-code-assist[bot]: summary review repeated the same whitespace-only task concern. Not changed for the same reason.
chatgpt-codex-connector[bot]: raised missing spawnMode / exitAfterTask in agent.create schema. Validated at packages/harness-driver/src/actions.ts:41; left as the blocking finding above because it is semantic API behavior.
CodeRabbit: raised malformed Markdown emphasis in the trajectory summary. Fixed in .agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/summary.md:39.
CodeRabbit: initial “review in progress” comment only. No action needed.
agent-relay-code[bot]: prior status comment said no review comments were available from .workforce/. Stale relative to current connector state; current review threads were fetched and accounted for.

Verification

npm install restored missing declared dependencies.
npm run typecheck passed.
npm run build passed; Rust broker build was skipped locally because Cargo is not installed.
npm run test passed: 72 files passed, 1 skipped; 887 tests passed, 5 skipped.
npm run lint passed with existing warnings only.
npx prettier --check ...changed files... passed.
GitHub connector reports PR open, mergeable, and CodeRabbit status success.

Not printing READY because there is still a validated semantic finding for the author to address.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds eval coverage for the task_exit spawn mode introduced in #1129: - s08-task-exit.ts: directly spawns a worker with EXIT_AFTER_TASK_INSTRUCTION and checks for a clean agent_exited (code 0) event — no lead required - ScenarioResult gains exitConfirmed and exitCode optional fields - Runner gains task-exit group (--group=task-exit) and imports TASK_EXIT_EVAL_SCENARIOS - run-task-exit-evals.sh drives the full matrix: claude (haiku/sonnet/opus), codex (gpt-4.5/gpt-4.1/o3/o4-mini), and opencode in batches of 3 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/broker/evals/run-task-exit-evals.sh`:
- Around line 66-71: The flush_batch() function at lines 66-71 suppresses all
exit codes from child processes using the || true pattern, causing the script to
ignore eval failures and report success even when runs fail. Remove the || true
suppression from the wait command and instead capture exit codes or exit the
script immediately when a child process fails, ensuring that any failed eval run
causes the overall script to fail. Apply this same fix to the other occurrences
of the flush_batch() function or equivalent wait patterns at lines 84-88 and
156-159.

In `@tests/integration/broker/evals/runner.ts`:
- Around line 142-153: The task-exit group selection routes evals through
mergeRepeats which currently omits the exitConfirmed and exitCode fields from
merged results, causing repeated task-exit evals to lose these newly added
signal fields. Preserve the exitConfirmed and exitCode fields when merging
repeated results in the mergeRepeats function or related merge logic that
processes scenarios for the task-exit group condition, ensuring these fields are
retained in the final merged output.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: fc81eb45-537f-41dc-a8f5-684b35485887

📥 Commits

Reviewing files that changed from the base of the PR and between 491a0b9 and c5e17ef.

📒 Files selected for processing (10)

CHANGELOG.md
crates/broker/src/listen_api.rs
crates/broker/src/runtime/api.rs
packages/cli/src/cli/agent-relay-mcp.startup.test.ts
packages/cli/src/cli/agent-relay-mcp.ts
tests/integration/broker/evals/run-task-exit-evals.sh
tests/integration/broker/evals/runner.ts
tests/integration/broker/evals/scenarios/index.ts
tests/integration/broker/evals/scenarios/s08-task-exit.ts
tests/integration/broker/evals/types.ts

✅ Files skipped from review due to trivial changes (1)

tests/integration/broker/evals/scenarios/s08-task-exit.ts

🚧 Files skipped from review as they are similar to previous changes (3)

packages/cli/src/cli/agent-relay-mcp.startup.test.ts
crates/broker/src/runtime/api.rs
crates/broker/src/listen_api.rs

coderabbitai · 2026-06-16T08:41:24Z

+flush_batch() {
+  for pid in "${PIDS[@]:-}"; do
+    wait "$pid" 2>/dev/null || true
+  done
+  PIDS=()
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow failed eval exits in batch waits.

flush_batch() ignores child failures (wait ... || true), so the script can report success even when one or more eval runs fail.

Suggested fix

PIDS=() +FAILED=0 flush_batch() { for pid in "${PIDS[@]:-}"; do - wait "$pid" 2>/dev/null || true + if ! wait "$pid"; then + FAILED=1 + fi done PIDS=() } @@ - PIDS+=($!) + PIDS+=("$!") @@ echo "" echo "═══ All logs in $LOG_DIR ════════════════════════════════════════════════" echo " Reports written to evals-reports/" +exit "$FAILED"

Also applies to: 84-88, 156-159

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/broker/evals/run-task-exit-evals.sh` around lines 66 - 71, The flush_batch() function at lines 66-71 suppresses all exit codes from child processes using the || true pattern, causing the script to ignore eval failures and report success even when runs fail. Remove the || true suppression from the wait command and instead capture exit codes or exit the script immediately when a child process fails, ensuring that any failed eval run causes the overall script to fail. Apply this same fix to the other occurrences of the flush_batch() function or equivalent wait patterns at lines 84-88 and 156-159.

coderabbitai · 2026-06-16T08:41:24Z

+            : flags.group === 'task-exit'
+              ? TASK_EXIT_EVAL_SCENARIOS
+              : flags.group === 'all'
+                ? ALL_SCENARIOS
+                : SCENARIOS;
  if (
    flags.group === 'lifecycle' ||
    flags.group === 'phrasing' ||
    flags.group === 'auto-routing' ||
-    flags.group === 'lead-delegation'
+    flags.group === 'lead-delegation' ||
+    flags.group === 'task-exit'
  )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve task-exit exit signals in repeated runs.

Selecting task-exit here routes runs through mergeRepeats; in Line 159 onward, merged results omit exitConfirmed and exitCode, so repeated task-exit evals lose the newly added signal fields.

Suggested fix

function mergeRepeats(results: ScenarioResult[]): ScenarioResult { const n = results.length; const passes = results.filter((r) => r.pass).length; const adherence = results.filter((r) => r.protocolAdherence !== null); const first = results[0]; + const hasExitFields = results.some((r) => r.exitConfirmed !== undefined || r.exitCode !== undefined); return { id: first.id, title: first.title, pass: passes / n >= 0.5, @@ releaseCount: results.reduce((s, r) => s + (r.releaseCount ?? 0), 0), onboarding: first.onboarding, notes: `${passes}/${n} runs passed` + (first.notes ? ` · ${first.notes}` : ''), + ...(hasExitFields + ? { + exitConfirmed: results.every((r) => r.exitConfirmed === true), + exitCode: results.every((r) => r.exitCode === 0) ? 0 : null, + } + : {}), }; }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/broker/evals/runner.ts` around lines 142 - 153, The task-exit group selection routes evals through mergeRepeats which currently omits the exitConfirmed and exitCode fields from merged results, causing repeated task-exit evals to lose these newly added signal fields. Preserve the exitConfirmed and exitCode fields when merging repeated results in the mergeRepeats function or related merge logic that processes scenarios for the task-exit group condition, ensuring these fields are retained in the final merged output.

…output The original approach relied on the model outputting `/exit` as bare text, which the broker PTY watches for but Claude models never emit reliably. The correct task-exit pattern is for the worker to call mcp__agent-relay__remove_agent with its own name, which triggers the broker to kill the PTY via workers.release(). Changes: - s08-task-exit: spawn worker with explicit name in task, wait for agent_released event, score pass = message sent + self-released - broker-harness: thread exitAfterTask through spawnAgent options - fleet.rs: add missing exit_after_task: false to ListenApiRequest::Spawn Verified results (1 run each): claude:haiku PASS claude:sonnet PASS claude:opus PASS codex PASS opencode FAIL (sent=0, no self-release) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/broker/evals/scenarios/s08-task-exit.ts`:
- Around line 39-53: The scenario is not properly validating the task-exit
runtime contract. In the spawnAgent call at lines 39-53, add exitAfterTask: true
to the options object to enable task-exit mode. Then in the event waiters at
lines 47-60, change from waiting for agent_released to waiting for agent_exited
and validate the exit code to confirm actual process termination semantics
rather than just self-release. Apply the same corrections at the sibling
locations at lines 57-63 and 85-86 in the same file to ensure all phases
properly assert agent process exit rather than just agent release.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 30086427-431d-46a4-aec3-8db3d6120e97

📥 Commits

Reviewing files that changed from the base of the PR and between c5e17ef and 0f20c5a.

📒 Files selected for processing (3)

crates/broker/src/runtime/fleet.rs
tests/integration/broker/evals/scenarios/s08-task-exit.ts
tests/integration/broker/utils/broker-harness.ts

coderabbitai · 2026-06-16T09:13:25Z

+    await harness.spawnAgent(worker, cli, ['general'], { task, model });
+    await sleep(STARTUP_MS);
+    harness.clearEvents();
+
+    // Phase 1: wait for the worker to send its task result.
+    await harness.waitForEvent('relay_inbound', phaseMs).promise.catch(() => {});
+
+    // Phase 2: wait for the worker to self-release via remove_agent(own name).
+    const releaseWaiter = harness.waitForEvent(
+      'agent_released',
+      phaseMs,
+      (e) => (e as Extract<BrokerEvent, { kind: 'agent_released' }>).name === worker
+    );
+    await releaseWaiter.promise.catch(() => {});
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scenario is validating self-release, not the task-exit runtime contract.

Line [39] does not set exitAfterTask, so regressions in the new spawn-mode plumbing can slip through. Lines [47]-[60] and Line [85] treat agent_released as exit confirmation, which does not assert actual agent process exit semantics (agent_exited + exit code).

Suggested adjustment

- await harness.spawnAgent(worker, cli, ['general'], { task, model }); + await harness.spawnAgent(worker, cli, ['general'], { task, model, exitAfterTask: true }); - const releaseWaiter = harness.waitForEvent( - 'agent_released', - phaseMs, - (e) => (e as Extract<BrokerEvent, { kind: 'agent_released' }>).name === worker - ); - await releaseWaiter.promise.catch(() => {}); + await harness + .waitForEvent( + 'agent_exited', + phaseMs, + (e) => (e as Extract<BrokerEvent, { kind: 'agent_exited' }>).name === worker + ) + .promise + .catch(() => {}); - const released = events.some( - (e): e is Extract<BrokerEvent, { kind: 'agent_released' }> => - e.kind === 'agent_released' && e.name === worker - ); - const pass = released && base.sent > 0; + const exited = events.some( + (e): e is Extract<BrokerEvent, { kind: 'agent_exited' }> => + e.kind === 'agent_exited' && e.name === worker + ); + const pass = exited && base.sent > 0; - exitConfirmed: released, + exitConfirmed: exited, + // also populate exitCode from the matched agent_exited event

Also applies to: 57-63, 85-86

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/broker/evals/scenarios/s08-task-exit.ts` around lines 39 - 53, The scenario is not properly validating the task-exit runtime contract. In the spawnAgent call at lines 39-53, add exitAfterTask: true to the options object to enable task-exit mode. Then in the event waiters at lines 47-60, change from waiting for agent_released to waiting for agent_exited and validate the exit code to confirm actual process termination semantics rather than just self-release. Apply the same corrections at the sibling locations at lines 57-63 and 85-86 in the same file to ensure all phases properly assert agent process exit rather than just agent release.

…opencode /exit output was evaluated for opencode but found unreliable: the model sends the task message but does not produce a bare /exit line the PTY watcher can detect. remove_agent(self) works across all harnesses. Also fixes opencode sent=0 regression: use relay message trigger after startup (matching all other scenario patterns) so the action fires after MCP is up. Verified results: claude:haiku PASS opencode PASS (all previous Claude/codex results unchanged) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-shot

…merge The fleet SDK commit (491d9a0) reorganized agent-relay-mcp.ts and inadvertently removed the agent_relay_tool_call telemetry tracking that was added in 1cb41cf. The fleet commit also changed the invoke_action test expectation to NOT fire telemetry (actions API has its own path), so the blanket enableInboxPiggyback wrapper approach no longer fits. Restores tracking by adding inline trackAgentRelayToolCall calls directly to the add_agent and remove_agent handlers, threaded via a new telemetryTransport param on registerAgentRelayTools. invoke_action via the actions API continues to skip agent_relay_tool_call as the fleet commit intended. Also runs npm install to link @agent-relay/fleet into node_modules, fixing the fleet.test.ts module-not-found failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The generic AGENT_RELAY_TOOL_CALL_METADATA table, readInvokedActionName, agentRelayActionNameCategory, and agentRelayToolCallMetadata were added when restoring telemetry via the enableInboxPiggyback blanket wrapper, but the final approach uses inline calls in add_agent/remove_agent instead. Remove the unused code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

My earlier fix only re-instrumented add_agent/remove_agent inline, which satisfied the failing test but silently dropped agent_relay_tool_call telemetry for the ~28 other owned tools that PR #1041 deliberately instrumented (post_message, send_dm, channels, reactions, inbox, etc.). Restore the original wrapper-based tracking in enableInboxPiggyback so every statically-registered owned tool emits telemetry again, and add a skip for the action-routed path (invoke_action + the dynamic per-action tools in actionToolNames) to honor the fleet SDK commit's intent that registered actions do not route through per-tool telemetry. Reverts the inline handler boilerplate and the registerAgentRelayTools param. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…metry entries Fresh-eyes review follow-ups: - gitignore /opencode.json: the opencode harness writes it to the repo root during eval runs with a live rk_live_ workspace key + at_live_ agent token; it was untracked but not ignored, one `git add .` from a secret leak. - Drop 3 stale AGENT_RELAY_TOOL_CALL_METADATA entries (get_message, get_thread, list_channel_members) — no such tools are registered, so the entries were dead lookups carried over from PR #1041. - Fix run-task-exit-evals.sh header: the s08 scenario self-releases via remove_agent(self), not `/exit` output. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The `spawn` tool delegates to the actions surface (actions.invoke('spawn')) exactly like `invoke_action` and the dynamic per-action tools, which are deliberately excluded from agent_relay_tool_call telemetry. Tracking `spawn` but not `invoke_action({name:'spawn'})` counted the same underlying action differently depending on which MCP surface the caller used. Introduce ACTION_ROUTED_TOOL_NAMES ({invoke_action, spawn}) and skip per-tool telemetry for those names alongside the actionToolNames check. Adds a test asserting spawn still runs but emits no telemetry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(spawn): add task-exit mode

027ea9e

khaliqgant requested a review from willwashburn as a code owner June 14, 2026 12:13

gemini-code-assist Bot reviewed Jun 14, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 14, 2026

View reviewed changes

chore: apply pr-reviewer fixes for #1129

9081857

style: auto-format with Prettier

6241d92

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

Comment thread .agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/summary.md Outdated

chore: apply pr-reviewer fixes for #1129

61aa902

style: auto-format with Prettier

491a0b9

khaliqgant added the needs-evals label Jun 15, 2026

Proactive Runtime Bot and others added 3 commits June 16, 2026 10:23

chore: merge origin/main into ar-257-factory-single-shot

8fd0da3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

style: auto-format with Prettier

c5e17ef

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Proactive Runtime Bot and others added 7 commits June 16, 2026 11:26

Merge remote-tracking branch 'origin/main' into ar-257-factory-single…

0690196

…-shot

khaliqgant merged commit edd6430 into main Jun 16, 2026
48 checks passed

khaliqgant deleted the ar-257-factory-single-shot branch June 16, 2026 10:38

khaliqgant mentioned this pull request Jun 16, 2026

Interactive agent as a critical-path step owner never completes (missed 'owner completion decision') AgentWorkforce/relayflows#11

Open

miyaontherelay mentioned this pull request Jun 22, 2026

fix(factory): propagate relay workspace to agents AgentWorkforce/factory#33

Merged

Conversation

khaliqgant commented Jun 14, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

agent-relay-code Bot commented Jun 14, 2026

Addressed comments

Verification

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

agent-relay-code Bot commented Jun 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

khaliqgant commented Jun 14, 2026 •

edited by cubic-dev-ai Bot

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading