Skip to content

feat(spawn): add task-exit mode#1129

Merged
khaliqgant merged 16 commits into
mainfrom
ar-257-factory-single-shot
Jun 16, 2026
Merged

feat(spawn): add task-exit mode#1129
khaliqgant merged 16 commits into
mainfrom
ar-257-factory-single-shot

Conversation

@khaliqgant

@khaliqgant khaliqgant commented Jun 14, 2026

Copy link
Copy Markdown
Member

Summary

  • Add optional task-exit spawn mode to broker HTTP spawn handling and harness-driver spawn inputs.
  • Add agent-relay local agent spawn/new --spawn-mode task-exit and --exit-after-task flags.
  • Extend MCP add_agent with task-exit options by appending the post-task /exit instruction.
  • Document the user-facing change in CHANGELOG.md.

Verification

  • cargo test -p agent-relay-broker exit_after_task_instruction_appends_clean_exit_contract
  • cargo test -p agent-relay-broker spawn_route_forwards_extended_fields
  • npx vitest run packages/harness-driver/src/agent-result.test.ts packages/cli/src/cli/commands/local-agent.test.ts packages/cli/src/cli/agent-relay-mcp.startup.test.ts
  • npm run build:harness-driver
  • npm --prefix packages/cli run build

No merge performed.

Review in cubic

@khaliqgant khaliqgant requested a review from willwashburn as a code owner June 14, 2026 12:13
@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an optional task-exit / single-shot spawn mode across the full stack: a SpawnMode type and new fields are added to harness-driver input interfaces, the broker HTTP handler parses and validates spawn_mode, a new apply_exit_after_task_instruction helper appends a /exit directive to the injected task, the harness-driver client and MCP add_agent tool forward the fields, CLI spawn/new subcommands expose matching flags, and integration test scenarios verify task completion and agent exit behavior.

Changes

Task-exit mode for agent-relay spawn APIs

Layer / File(s) Summary
SpawnMode type and spawn input contracts
packages/harness-driver/src/types.ts, packages/cli/src/cli/lib/client-factory.ts, packages/harness-driver/src/lifecycle-hooks.ts
Introduces exported SpawnMode union type and adds optional spawnMode/exitAfterTask fields to SpawnPtyInput, SpawnHeadlessInput, SpawnCliInput, ClientSpawnOptions, and SpawnPatch.
Broker listen API: spawn_mode parsing and exit_after_task forwarding
crates/broker/src/listen_api.rs
Adds exit_after_task: bool to ListenApiRequest::Spawn, parses and validates spawn_mode/spawnMode from HTTP body (returning 400 for unknown values), ORs the derived flag with the explicit field, forwards it to the broker, and updates spawn route tests.
Broker runtime: apply_exit_after_task_instruction helper and task routing
crates/broker/src/runtime/session.rs, crates/broker/src/runtime/api.rs, crates/broker/src/runtime/tests.rs, crates/broker/src/runtime/fleet.rs
Adds EXIT_AFTER_TASK_INSTRUCTION constant and apply_exit_after_task_instruction function; routes effective_task through that helper instead of normalize_initial_task when exit_after_task is set; adds unit test; fleet.rs includes explicit exit_after_task: false in spawn requests.
Harness-driver client serialization
packages/harness-driver/src/client.ts, packages/harness-driver/src/agent-result.test.ts
buildSpawnPtyBody and buildSpawnCliBody conditionally include spawnMode/exitAfterTask in /api/spawn request bodies; applySpawnPatch merges those fields from lifecycle hooks; adds serialization test.
CLI commands and MCP add_agent surface
packages/cli/src/cli/agent-relay-mcp.ts, packages/cli/src/cli/commands/local-agent.ts, packages/cli/src/cli/agent-relay-mcp.startup.test.ts, packages/cli/src/cli/commands/local-agent.test.ts
Adds post-task /exit instruction helper to MCP add_agent tool schema and wraps spawn task payload when task_exit/single-shot mode is selected; adds --spawn-mode and --exit-after-task flags with validation to local agent spawn and new subcommands; adds tests for both surfaces.
Integration test scenarios and eval infrastructure
tests/integration/broker/evals/scenarios/s08-task-exit.ts, tests/integration/broker/evals/runner.ts, tests/integration/broker/evals/scenarios/index.ts, tests/integration/broker/evals/types.ts, tests/integration/broker/utils/broker-harness.ts, tests/integration/broker/evals/run-task-exit-evals.sh
Adds scenario s08-task-exit that spawns agent with exit instruction, waits for relay_inbound and agent_released events, validates exit confirmation; extends runner to support --group=task-exit and TASK_EXIT_EVAL_SCENARIOS; updates ScenarioResult type with exit metadata; adds broker-harness exitAfterTask option; includes Bash eval script with parallel batch execution across Claude/Codex/OpenCode models.
Trajectory records and CHANGELOG
.agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/*, .agentworkforce/trajectories/completed/2026-06/traj_3gefkmrnuaui/*, CHANGELOG.md
New trajectory completion summary and JSON status/retrospective update for the security track review run; adds reviewed PR trajectory summary and metadata; CHANGELOG entry for the new task-exit mode.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as local-agent CLI / MCP add_agent
    participant HarnessDriver as HarnessDriverClient
    participant BrokerHTTP as listen_api_spawn (HTTP)
    participant BrokerRuntime as BrokerRuntime::handle_api_request
    participant Session as apply_exit_after_task_instruction

    CLI->>HarnessDriver: spawnPty({ spawnMode: 'task_exit', exitAfterTask: true, task })
    HarnessDriver->>BrokerHTTP: POST /api/spawn { spawnMode, exitAfterTask, task }
    BrokerHTTP->>BrokerHTTP: parse spawn_mode → derive exit_after_task bool (OR with explicit field)
    BrokerHTTP-->>CLI: 400 BAD_REQUEST (if unsupported spawnMode)
    BrokerHTTP->>BrokerRuntime: ListenApiRequest::Spawn { exit_after_task: true, task }
    BrokerRuntime->>Session: apply_exit_after_task_instruction(task)
    Session-->>BrokerRuntime: task + EXIT_AFTER_TASK_INSTRUCTION (/exit)
    BrokerRuntime->>BrokerRuntime: spawn agent session with effective_task
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Poem

🐇 Hop, hop — the agent is done!
It types /exit and off it runs,
No lingering sessions, no idle wait,
Task-exit mode seals the gate.
The rabbit cheers: clean and bright —
One injected task, then goodnight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and concisely summarizes the main change: adding a task-exit mode to the spawn functionality.
Description check ✅ Passed The description covers the key aspects of the change with a clear summary, verification steps, and build commands, though it omits the manual testing checkbox from the template.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ar-257-factory-single-shot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optional task-exit mode for spawned CLI agents, allowing them to cleanly self-terminate with /exit after completing their injected task. This is implemented across the Rust broker, CLI commands, MCP server, and harness driver. Feedback on the changes suggests improving the robustness of the TypeScript helper withExitAfterTaskInstruction to handle empty or whitespace-only tasks without prepending unnecessary newlines, aligning its behavior with the Rust implementation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +48 to +50
function withExitAfterTaskInstruction(task: string): string {
return `${task}\n\n${EXIT_AFTER_TASK_INSTRUCTION}`;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the task parameter is empty or contains only whitespace, the current implementation will prepend unnecessary newlines (\n\n) before the instruction. To match the robustness of the Rust implementation (apply_exit_after_task_instruction), we should trim the task and only append the newlines if the task is non-empty.

Suggested change
function withExitAfterTaskInstruction(task: string): string {
return `${task}\n\n${EXIT_AFTER_TASK_INSTRUCTION}`;
}
function withExitAfterTaskInstruction(task: string): string {
const trimmed = task.trim();
return trimmed ? `${trimmed}\n\n${EXIT_AFTER_TASK_INSTRUCTION}` : EXIT_AFTER_TASK_INSTRUCTION;
}

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 027ea9e0cd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +41 to +42
spawnMode?: SpawnMode;
exitAfterTask?: boolean;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Permit task-exit fields in driver action schema

These new public spawn fields are not mirrored in the registered agent.create action schema: packages/harness-driver/src/actions.ts still sets additionalProperties: false and only lists the older spawn properties. In environments that create agents through registerDriverActions instead of calling HarnessDriverClient.spawnPty directly, inputs such as spawnMode: 'task_exit' or exitAfterTask: true are rejected as additional properties before BrokerDriver.spawn can forward them, so the new task-exit mode is unavailable on that spawn API.

Useful? React with 👍 / 👎.

@agent-relay-code

Copy link
Copy Markdown
Contributor

No blocking findings in the current checkout. I did not apply any edits.

Addressed comments

  • No bot or reviewer comments were available in .workforce/; only pr.diff, changed-files.txt, and context.json were present.

Verification

  • npx prettier --check ...changed files... passed.
  • npm --prefix packages/cli run lint passed with existing warnings only.
  • Focused tests passed: 44 passed.
  • npm run typecheck passed.
  • npm run build passed; Rust was skipped by the repo script because Cargo is not installed.
  • npm run test passed: 72 passed | 1 skipped, 887 passed | 5 skipped.

Could not run Rust CI locally because cargo is not installed in this environment. I also could not verify GitHub check/mergeable state from the provided artifacts, so I am not marking this READY.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
CHANGELOG.md (1)

12-12: ⚡ Quick win

Rewrite this bullet to be impact-first and less implementation-specific.

This reads a bit implementation-heavy for [Unreleased]; keep it as a short user-visible outcome.

Suggested wording
-- `agent-relay` spawn APIs add an optional task-exit mode so spawned CLI agents run the injected task and then cleanly self-terminate with `/exit`.
+- `agent-relay` spawn APIs now support an optional task-exit mode so spawned CLI agents can automatically exit after task completion.

As per coding guidelines, “Changelog entries should be concise and impact-first, with one short bullet per user-visible change… Omit … implementation details.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` at line 12, Rewrite the changelog entry on line 12 to focus on
the user-visible impact rather than implementation details. Remove technical
jargon like "spawn APIs," "task-exit mode," and "/exit" command, and instead
express the benefit in simple terms: that spawned CLI agents can now cleanly
complete their task and exit automatically. Keep the entry concise and
outcome-focused, omitting internal mechanics and API specifics.

Source: Coding guidelines

packages/cli/src/cli/commands/local-agent.test.ts (1)

70-98: ⚡ Quick win

Add parity tests for the new command and invalid --spawn-mode paths.

Line 70 adds good coverage for spawn, but the same lifecycle plumbing/validation was also added to local agent new. Please add tests for: (1) forwarding on new, and (2) rejection of unsupported spawn-mode values, to prevent drift between the two command paths.

Proposed test additions
+  it('new forwards task-exit lifecycle options', async () => {
+    const { program, client, attach } = harness();
+    await program.parseAsync(
+      [
+        'local',
+        'agent',
+        'new',
+        'codex',
+        '--name',
+        'WorkerB',
+        '--task',
+        'Ship it',
+        '--spawn-mode',
+        'task-exit',
+        '--exit-after-task',
+      ],
+      { from: 'user' }
+    );
+
+    expect(client.spawnPty).toHaveBeenCalledWith(
+      expect.objectContaining({
+        name: 'WorkerB',
+        cli: 'codex',
+        task: 'Ship it',
+        spawnMode: 'task_exit',
+        exitAfterTask: true,
+      })
+    );
+    expect(attach).toHaveBeenCalled();
+  });
+
+  it('spawn rejects unknown spawn mode', async () => {
+    const { program, client, error, exit } = harness();
+    await program.parseAsync(
+      ['local', 'agent', 'spawn', 'codex', '--spawn-mode', 'bogus'],
+      { from: 'user' }
+    );
+    expect(client.spawnPty).not.toHaveBeenCalled();
+    expect(error).toHaveBeenCalledWith(expect.stringContaining('Unknown spawn mode'));
+    expect(exit).toHaveBeenCalledWith(1);
+  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/src/cli/commands/local-agent.test.ts` around lines 70 - 98, The
test at line 70-98 provides good coverage for the spawn command's task-exit
lifecycle options forwarding, but similar tests are missing for parity. Add
tests for the local agent new command that verify it forwards the same task-exit
lifecycle options (name, cli, task, spawnMode as task_exit, and exitAfterTask)
to client.spawnPty, similar to the spawn test structure. Additionally, add tests
that verify both the spawn and new command paths properly reject invalid
spawn-mode values that are not supported, ensuring validation consistency
between the two command implementations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/summary.md:
- Line 41: The token mcp**agent-relay**_ in the summary line has malformed
emphasis markup with mismatched delimiters (starting with ** but ending with _),
which violates markdown linting rule MD037. Fix this by ensuring the emphasis
markup uses consistent delimiters throughout—either replace it with
mcp_agent-relay_ (using underscores on both sides) or mcp**agent-relay** (using
asterisks on both sides), or remove the emphasis entirely depending on the
intended formatting.

---

Nitpick comments:
In `@CHANGELOG.md`:
- Line 12: Rewrite the changelog entry on line 12 to focus on the user-visible
impact rather than implementation details. Remove technical jargon like "spawn
APIs," "task-exit mode," and "/exit" command, and instead express the benefit in
simple terms: that spawned CLI agents can now cleanly complete their task and
exit automatically. Keep the entry concise and outcome-focused, omitting
internal mechanics and API specifics.

In `@packages/cli/src/cli/commands/local-agent.test.ts`:
- Around line 70-98: The test at line 70-98 provides good coverage for the spawn
command's task-exit lifecycle options forwarding, but similar tests are missing
for parity. Add tests for the local agent new command that verify it forwards
the same task-exit lifecycle options (name, cli, task, spawnMode as task_exit,
and exitAfterTask) to client.spawnPty, similar to the spawn test structure.
Additionally, add tests that verify both the spawn and new command paths
properly reject invalid spawn-mode values that are not supported, ensuring
validation consistency between the two command implementations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2fa9f7ad-ea65-40a9-a3d1-d65f0ffc7149

📥 Commits

Reviewing files that changed from the base of the PR and between 87c05f1 and 6241d92.

📒 Files selected for processing (16)
  • .agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/summary.md
  • .agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/trajectory.json
  • CHANGELOG.md
  • crates/broker/src/listen_api.rs
  • crates/broker/src/runtime/api.rs
  • crates/broker/src/runtime/session.rs
  • crates/broker/src/runtime/tests.rs
  • packages/cli/src/cli/agent-relay-mcp.startup.test.ts
  • packages/cli/src/cli/agent-relay-mcp.ts
  • packages/cli/src/cli/commands/local-agent.test.ts
  • packages/cli/src/cli/commands/local-agent.ts
  • packages/cli/src/cli/lib/client-factory.ts
  • packages/harness-driver/src/agent-result.test.ts
  • packages/harness-driver/src/client.ts
  • packages/harness-driver/src/lifecycle-hooks.ts
  • packages/harness-driver/src/types.ts

Comment thread .agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/summary.md Outdated
@agent-relay-code

Copy link
Copy Markdown
Contributor

Findings

  • packages/harness-driver/src/actions.ts:41: agent.create still uses additionalProperties: false but its schema does not include spawnMode or exitAfterTask. The PR adds those public spawn fields to driver types/client serialization, but callers using registerDriverActions will have task-exit inputs rejected before driver.spawn. This needs a semantic API/schema patch and coverage, so I left it as a review finding instead of editing.

I applied one mechanical fix:

  • .agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/summary.md:39-41: replaced malformed Markdown tool-token emphasis with inline code formatting.

Addressed comments

  • gemini-code-assist[bot]: raised whitespace-only task handling in withExitAfterTaskInstruction at packages/cli/src/cli/agent-relay-mcp.ts:48. Not changed because trimming task input changes runtime behavior and needs author judgment.
  • gemini-code-assist[bot]: summary review repeated the same whitespace-only task concern. Not changed for the same reason.
  • chatgpt-codex-connector[bot]: raised missing spawnMode / exitAfterTask in agent.create schema. Validated at packages/harness-driver/src/actions.ts:41; left as the blocking finding above because it is semantic API behavior.
  • CodeRabbit: raised malformed Markdown emphasis in the trajectory summary. Fixed in .agentworkforce/trajectories/completed/2026-06/traj_aq4edi70h18a/summary.md:39.
  • CodeRabbit: initial “review in progress” comment only. No action needed.
  • agent-relay-code[bot]: prior status comment said no review comments were available from .workforce/. Stale relative to current connector state; current review threads were fetched and accounted for.

Verification

  • npm install restored missing declared dependencies.
  • npm run typecheck passed.
  • npm run build passed; Rust broker build was skipped locally because Cargo is not installed.
  • npm run test passed: 72 files passed, 1 skipped; 887 tests passed, 5 skipped.
  • npm run lint passed with existing warnings only.
  • npx prettier --check ...changed files... passed.
  • GitHub connector reports PR open, mergeable, and CodeRabbit status success.

Not printing READY because there is still a validated semantic finding for the author to address.

Proactive Runtime Bot and others added 3 commits June 16, 2026 10:23
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds eval coverage for the task_exit spawn mode introduced in #1129:
- s08-task-exit.ts: directly spawns a worker with EXIT_AFTER_TASK_INSTRUCTION
  and checks for a clean agent_exited (code 0) event — no lead required
- ScenarioResult gains exitConfirmed and exitCode optional fields
- Runner gains task-exit group (--group=task-exit) and imports TASK_EXIT_EVAL_SCENARIOS
- run-task-exit-evals.sh drives the full matrix: claude (haiku/sonnet/opus),
  codex (gpt-4.5/gpt-4.1/o3/o4-mini), and opencode in batches of 3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/broker/evals/run-task-exit-evals.sh`:
- Around line 66-71: The flush_batch() function at lines 66-71 suppresses all
exit codes from child processes using the || true pattern, causing the script to
ignore eval failures and report success even when runs fail. Remove the || true
suppression from the wait command and instead capture exit codes or exit the
script immediately when a child process fails, ensuring that any failed eval run
causes the overall script to fail. Apply this same fix to the other occurrences
of the flush_batch() function or equivalent wait patterns at lines 84-88 and
156-159.

In `@tests/integration/broker/evals/runner.ts`:
- Around line 142-153: The task-exit group selection routes evals through
mergeRepeats which currently omits the exitConfirmed and exitCode fields from
merged results, causing repeated task-exit evals to lose these newly added
signal fields. Preserve the exitConfirmed and exitCode fields when merging
repeated results in the mergeRepeats function or related merge logic that
processes scenarios for the task-exit group condition, ensuring these fields are
retained in the final merged output.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: fc81eb45-537f-41dc-a8f5-684b35485887

📥 Commits

Reviewing files that changed from the base of the PR and between 491a0b9 and c5e17ef.

📒 Files selected for processing (10)
  • CHANGELOG.md
  • crates/broker/src/listen_api.rs
  • crates/broker/src/runtime/api.rs
  • packages/cli/src/cli/agent-relay-mcp.startup.test.ts
  • packages/cli/src/cli/agent-relay-mcp.ts
  • tests/integration/broker/evals/run-task-exit-evals.sh
  • tests/integration/broker/evals/runner.ts
  • tests/integration/broker/evals/scenarios/index.ts
  • tests/integration/broker/evals/scenarios/s08-task-exit.ts
  • tests/integration/broker/evals/types.ts
✅ Files skipped from review due to trivial changes (1)
  • tests/integration/broker/evals/scenarios/s08-task-exit.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/cli/src/cli/agent-relay-mcp.startup.test.ts
  • crates/broker/src/runtime/api.rs
  • crates/broker/src/listen_api.rs

Comment on lines +66 to +71
flush_batch() {
for pid in "${PIDS[@]:-}"; do
wait "$pid" 2>/dev/null || true
done
PIDS=()
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow failed eval exits in batch waits.

flush_batch() ignores child failures (wait ... || true), so the script can report success even when one or more eval runs fail.

Suggested fix
 PIDS=()
+FAILED=0
 
 flush_batch() {
   for pid in "${PIDS[@]:-}"; do
-    wait "$pid" 2>/dev/null || true
+    if ! wait "$pid"; then
+      FAILED=1
+    fi
   done
   PIDS=()
 }
@@
-  PIDS+=($!)
+  PIDS+=("$!")
@@
 echo ""
 echo "═══ All logs in $LOG_DIR ════════════════════════════════════════════════"
 echo "    Reports written to evals-reports/"
+exit "$FAILED"

Also applies to: 84-88, 156-159

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/broker/evals/run-task-exit-evals.sh` around lines 66 - 71,
The flush_batch() function at lines 66-71 suppresses all exit codes from child
processes using the || true pattern, causing the script to ignore eval failures
and report success even when runs fail. Remove the || true suppression from the
wait command and instead capture exit codes or exit the script immediately when
a child process fails, ensuring that any failed eval run causes the overall
script to fail. Apply this same fix to the other occurrences of the
flush_batch() function or equivalent wait patterns at lines 84-88 and 156-159.

Comment on lines +142 to 153
: flags.group === 'task-exit'
? TASK_EXIT_EVAL_SCENARIOS
: flags.group === 'all'
? ALL_SCENARIOS
: SCENARIOS;
if (
flags.group === 'lifecycle' ||
flags.group === 'phrasing' ||
flags.group === 'auto-routing' ||
flags.group === 'lead-delegation'
flags.group === 'lead-delegation' ||
flags.group === 'task-exit'
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve task-exit exit signals in repeated runs.

Selecting task-exit here routes runs through mergeRepeats; in Line 159 onward, merged results omit exitConfirmed and exitCode, so repeated task-exit evals lose the newly added signal fields.

Suggested fix
 function mergeRepeats(results: ScenarioResult[]): ScenarioResult {
   const n = results.length;
   const passes = results.filter((r) => r.pass).length;
   const adherence = results.filter((r) => r.protocolAdherence !== null);
   const first = results[0];
+  const hasExitFields = results.some((r) => r.exitConfirmed !== undefined || r.exitCode !== undefined);
   return {
     id: first.id,
     title: first.title,
     pass: passes / n >= 0.5,
@@
     releaseCount: results.reduce((s, r) => s + (r.releaseCount ?? 0), 0),
     onboarding: first.onboarding,
     notes: `${passes}/${n} runs passed` + (first.notes ? ` · ${first.notes}` : ''),
+    ...(hasExitFields
+      ? {
+          exitConfirmed: results.every((r) => r.exitConfirmed === true),
+          exitCode: results.every((r) => r.exitCode === 0) ? 0 : null,
+        }
+      : {}),
   };
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/broker/evals/runner.ts` around lines 142 - 153, The
task-exit group selection routes evals through mergeRepeats which currently
omits the exitConfirmed and exitCode fields from merged results, causing
repeated task-exit evals to lose these newly added signal fields. Preserve the
exitConfirmed and exitCode fields when merging repeated results in the
mergeRepeats function or related merge logic that processes scenarios for the
task-exit group condition, ensuring these fields are retained in the final
merged output.

…output

The original approach relied on the model outputting `/exit` as bare text,
which the broker PTY watches for but Claude models never emit reliably.
The correct task-exit pattern is for the worker to call
mcp__agent-relay__remove_agent with its own name, which triggers the broker
to kill the PTY via workers.release().

Changes:
- s08-task-exit: spawn worker with explicit name in task, wait for
  agent_released event, score pass = message sent + self-released
- broker-harness: thread exitAfterTask through spawnAgent options
- fleet.rs: add missing exit_after_task: false to ListenApiRequest::Spawn

Verified results (1 run each):
  claude:haiku   PASS  claude:sonnet  PASS  claude:opus  PASS
  codex          PASS  opencode       FAIL (sent=0, no self-release)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/broker/evals/scenarios/s08-task-exit.ts`:
- Around line 39-53: The scenario is not properly validating the task-exit
runtime contract. In the spawnAgent call at lines 39-53, add exitAfterTask: true
to the options object to enable task-exit mode. Then in the event waiters at
lines 47-60, change from waiting for agent_released to waiting for agent_exited
and validate the exit code to confirm actual process termination semantics
rather than just self-release. Apply the same corrections at the sibling
locations at lines 57-63 and 85-86 in the same file to ensure all phases
properly assert agent process exit rather than just agent release.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 30086427-431d-46a4-aec3-8db3d6120e97

📥 Commits

Reviewing files that changed from the base of the PR and between c5e17ef and 0f20c5a.

📒 Files selected for processing (3)
  • crates/broker/src/runtime/fleet.rs
  • tests/integration/broker/evals/scenarios/s08-task-exit.ts
  • tests/integration/broker/utils/broker-harness.ts

Comment on lines +39 to +53
await harness.spawnAgent(worker, cli, ['general'], { task, model });
await sleep(STARTUP_MS);
harness.clearEvents();

// Phase 1: wait for the worker to send its task result.
await harness.waitForEvent('relay_inbound', phaseMs).promise.catch(() => {});

// Phase 2: wait for the worker to self-release via remove_agent(own name).
const releaseWaiter = harness.waitForEvent(
'agent_released',
phaseMs,
(e) => (e as Extract<BrokerEvent, { kind: 'agent_released' }>).name === worker
);
await releaseWaiter.promise.catch(() => {});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scenario is validating self-release, not the task-exit runtime contract.

Line [39] does not set exitAfterTask, so regressions in the new spawn-mode plumbing can slip through. Lines [47]-[60] and Line [85] treat agent_released as exit confirmation, which does not assert actual agent process exit semantics (agent_exited + exit code).

Suggested adjustment
-    await harness.spawnAgent(worker, cli, ['general'], { task, model });
+    await harness.spawnAgent(worker, cli, ['general'], { task, model, exitAfterTask: true });

-    const releaseWaiter = harness.waitForEvent(
-      'agent_released',
-      phaseMs,
-      (e) => (e as Extract<BrokerEvent, { kind: 'agent_released' }>).name === worker
-    );
-    await releaseWaiter.promise.catch(() => {});
+    await harness
+      .waitForEvent(
+        'agent_exited',
+        phaseMs,
+        (e) => (e as Extract<BrokerEvent, { kind: 'agent_exited' }>).name === worker
+      )
+      .promise
+      .catch(() => {});

-    const released = events.some(
-      (e): e is Extract<BrokerEvent, { kind: 'agent_released' }> =>
-        e.kind === 'agent_released' && e.name === worker
-    );
-    const pass = released && base.sent > 0;
+    const exited = events.some(
+      (e): e is Extract<BrokerEvent, { kind: 'agent_exited' }> =>
+        e.kind === 'agent_exited' && e.name === worker
+    );
+    const pass = exited && base.sent > 0;

-      exitConfirmed: released,
+      exitConfirmed: exited,
+      // also populate exitCode from the matched agent_exited event

Also applies to: 57-63, 85-86

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/broker/evals/scenarios/s08-task-exit.ts` around lines 39 -
53, The scenario is not properly validating the task-exit runtime contract. In
the spawnAgent call at lines 39-53, add exitAfterTask: true to the options
object to enable task-exit mode. Then in the event waiters at lines 47-60,
change from waiting for agent_released to waiting for agent_exited and validate
the exit code to confirm actual process termination semantics rather than just
self-release. Apply the same corrections at the sibling locations at lines 57-63
and 85-86 in the same file to ensure all phases properly assert agent process
exit rather than just agent release.

Proactive Runtime Bot and others added 7 commits June 16, 2026 11:26
…opencode

/exit output was evaluated for opencode but found unreliable: the model sends
the task message but does not produce a bare /exit line the PTY watcher can
detect. remove_agent(self) works across all harnesses.

Also fixes opencode sent=0 regression: use relay message trigger after startup
(matching all other scenario patterns) so the action fires after MCP is up.

Verified results:
  claude:haiku PASS  opencode PASS  (all previous Claude/codex results unchanged)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…merge

The fleet SDK commit (491d9a0) reorganized agent-relay-mcp.ts and
inadvertently removed the agent_relay_tool_call telemetry tracking that
was added in 1cb41cf. The fleet commit also changed the invoke_action
test expectation to NOT fire telemetry (actions API has its own path),
so the blanket enableInboxPiggyback wrapper approach no longer fits.

Restores tracking by adding inline trackAgentRelayToolCall calls directly
to the add_agent and remove_agent handlers, threaded via a new
telemetryTransport param on registerAgentRelayTools. invoke_action via
the actions API continues to skip agent_relay_tool_call as the fleet
commit intended.

Also runs npm install to link @agent-relay/fleet into node_modules,
fixing the fleet.test.ts module-not-found failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The generic AGENT_RELAY_TOOL_CALL_METADATA table, readInvokedActionName,
agentRelayActionNameCategory, and agentRelayToolCallMetadata were added
when restoring telemetry via the enableInboxPiggyback blanket wrapper,
but the final approach uses inline calls in add_agent/remove_agent
instead. Remove the unused code.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
My earlier fix only re-instrumented add_agent/remove_agent inline, which
satisfied the failing test but silently dropped agent_relay_tool_call
telemetry for the ~28 other owned tools that PR #1041 deliberately
instrumented (post_message, send_dm, channels, reactions, inbox, etc.).

Restore the original wrapper-based tracking in enableInboxPiggyback so
every statically-registered owned tool emits telemetry again, and add a
skip for the action-routed path (invoke_action + the dynamic per-action
tools in actionToolNames) to honor the fleet SDK commit's intent that
registered actions do not route through per-tool telemetry. Reverts the
inline handler boilerplate and the registerAgentRelayTools param.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…metry entries

Fresh-eyes review follow-ups:
- gitignore /opencode.json: the opencode harness writes it to the repo
  root during eval runs with a live rk_live_ workspace key + at_live_
  agent token; it was untracked but not ignored, one `git add .` from a
  secret leak.
- Drop 3 stale AGENT_RELAY_TOOL_CALL_METADATA entries (get_message,
  get_thread, list_channel_members) — no such tools are registered, so
  the entries were dead lookups carried over from PR #1041.
- Fix run-task-exit-evals.sh header: the s08 scenario self-releases via
  remove_agent(self), not `/exit` output.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The `spawn` tool delegates to the actions surface (actions.invoke('spawn'))
exactly like `invoke_action` and the dynamic per-action tools, which are
deliberately excluded from agent_relay_tool_call telemetry. Tracking `spawn`
but not `invoke_action({name:'spawn'})` counted the same underlying action
differently depending on which MCP surface the caller used.

Introduce ACTION_ROUTED_TOOL_NAMES ({invoke_action, spawn}) and skip per-tool
telemetry for those names alongside the actionToolNames check. Adds a test
asserting spawn still runs but emits no telemetry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@khaliqgant khaliqgant merged commit edd6430 into main Jun 16, 2026
48 checks passed
@khaliqgant khaliqgant deleted the ar-257-factory-single-shot branch June 16, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant