Skip to content

test(broker): add infrastructure-failure delivery coverage#1100

Merged
willwashburn merged 3 commits into
mainfrom
test/broker-infra-failures
Jun 12, 2026
Merged

test(broker): add infrastructure-failure delivery coverage#1100
willwashburn merged 3 commits into
mainfrom
test/broker-infra-failures

Conversation

@willwashburn

Copy link
Copy Markdown
Member

What

Adds integration coverage for delivery behavior when the infrastructure fails (broker crash, shutdown, queue overflow, unverifiable injection), exercising the durable-delivery work from #1073. Test-only — no broker changes.

Two commits:

  1. test(broker): repoint integration suite at split SDK packages — prerequisite. The whole broker suite has not compiled since the SDK narrowing (HarnessDriverClient/BrokerEvent/ListAgent/SendMessageInput/HarnessDriverProtocolError moved to @agent-relay/harness-driver, RelayCast to @relaycast/sdk). Imports are repointed and the harness's unused AgentRelay facade is dropped (no test used harness.relay; the SDK's AgentRelay is now the workspace client, not a broker facade).
  2. test(broker): cover infrastructure-failure delivery scenarios — new tests/integration/broker/infra-failures.test.ts.

Scenarios shipped

# Scenario What it asserts
1 Crash recovery (--persist + SIGKILL + restart, same state dir) In-flight deliveries to a no-echo sink survive in pending-<broker>.json; an already-acked delivery is not persisted (dedup); the restarted broker reloads the entries and retries them (observed as message_delivery_failed "recipient gone" for the exact persisted event_ids, since the sink died with the broker); no event on the restarted broker references the acked delivery
2 Graceful shutdown persistence Clean shutdown with undelivered deliveries keeps the pending file with exactly those entries (regression test for the old clear-before-persist bug)
3 Queue overflow observability Bursting 260 messages into a manual_flush worker emits one delivery_dropped per eviction with count: 1 and a pending queue full (max 256) reason naming the evicted sender; the evicted messages are the 4 oldest; the surviving queue is exactly the cap with the right membership; assertNoDroppedDeliveries trips on this scenario
4 Unverified delivery visibility A PTY recipient that swallows output (stty -echo + stdin sink) yields delivery_verified with verification: "timeout_fallback" and an "echo not detected" reason — never an echo-verified success — while the delivery is still acked exactly once

Skipped (explicit skip stub in the file): WS-gap behavior. The suite provisions a real hosted Relaycast workspace (ensureApiKeyRelayCast.createWorkspace) and the broker dials it directly; there is no fake/local endpoint to sever and restore deterministically. Worth adding once the suite grows a local Relaycast stub.

New tests use event-driven waits (waitForEvent with event_id predicates) and bounded polling, no fixed sleeps. They need only sh/cat, so they run in the default suite (no RELAY_INTEGRATION_REAL_CLI gating).

Test results

New file (node --test dist/infra-failures.test.js): 4 pass / 1 skip, three consecutive standalone runs plus once inside the full suite — no flakes observed. (~21s per run.)

Full suite (node --test --test-concurrency=1 dist/*.test.js): 31 pass / 24 fail / 59 skip. All 24 failures are pre-existing drift that accumulated while the suite was uncompilable — none are caused by this PR, and all reproduce in isolation:

  • continuity (5) — the broker no longer writes a continuity JSON file on release; only agent-initiated KIND: continuity blocks are saved. continueFrom context injection is broken end-to-end. Possible real regression (see below).
  • lockfile (13) — tests expect broker-<name>.pid files; the broker now records its PID in connection.json only. Tests are stale. (The lockfile test file also hangs the runner after failing because spawned brokers are never reaped — needed a manual kill during the suite run.)
  • cli-commands (2)swarm --dry-run and workflows list moved out with the Relayflows split; tests are stale.
  • messaging (1)/api/send returns {success, event_id} but HarnessDriverClient.sendMessage (and the test) expect a targets array. SDK/broker response contract mismatch.
  • lifecycle (1) — releasing a nonexistent agent now returns idempotent success instead of agent_not_found.
  • pty-exit (1)agent_exited is not observed within 30s of a short-lived PTY child exiting.
  • channel-management (1)worker_ready not observed within 10s for a cat agent.

Possible real bugs surfaced (not fixed here, per scope)

  1. Release-time continuity files are gone: ListenApiRequest::Release never writes the continuity record the continueFrom spawn path reads, so cross-session continuity only works if the agent itself emitted a KIND: continuity block. If release-time continuity is still intended behavior, this is a broker regression; if not, the continuity tests need rewriting.
  2. sendMessage response contract: @agent-relay/harness-driver types promise targets: string[] from /api/send; the broker never returns it.

Notes

  • No CHANGELOG.md entry: the root changelog is the user-facing release narrative and does not carry test-only changes.
  • No trail trajectory recorded: .agentworkforce/trajectories/active/ already contains an active trajectory from fix(broker): make delivery handling durable and observable #1073, and trail start refuses to stack a second one; I didn't want to complete/abandon someone else's record.

🤖 Generated with Claude Code

willwashburn and others added 2 commits June 11, 2026 13:55
The broker integration suite imported HarnessDriverClient, BrokerEvent,
ListAgent, SendMessageInput, and protocol error types from
@agent-relay/sdk, which no longer exports them after the SDK narrowing;
they live in @agent-relay/harness-driver (and RelayCast in
@relaycast/sdk). The suite has not compiled since. Update the imports
and drop the harness's unused AgentRelay facade (no test used
harness.relay, and the SDK's AgentRelay is now the workspace client,
not a broker facade).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add integration coverage for delivery durability and observability when
the infrastructure fails, exercising the durable-delivery broker work:

- crash recovery: in-flight deliveries survive a broker SIGKILL in the
  persisted pending snapshot, reload on restart with the same state
  dir, and get retried; already-acked deliveries are not redelivered
- graceful shutdown: undelivered pending deliveries survive a clean
  shutdown (regression test for the old clear-before-persist behavior)
- queue overflow: bursting past MAX_PENDING_PER_WORKER emits one
  delivery_dropped event per eviction, drops the oldest messages, and
  trips assertNoDroppedDeliveries
- unverified delivery: a recipient that swallows echo output yields
  delivery_verified with verification "timeout_fallback" instead of a
  verified success

Upstream Relaycast connection-gap behavior is left as an explicit skip:
the suite provisions a real hosted workspace, so there is no local
endpoint to sever deterministically.

Tests use a no-echo PTY sink fixture (stty -echo + stdin sink) and
event-driven waits/polling instead of fixed sleeps.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@willwashburn willwashburn requested a review from khaliqgant as a code owner June 11, 2026 18:01
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: dddb361a-59dc-45a2-816c-2197ea649d9d

📥 Commits

Reviewing files that changed from the base of the PR and between 7d6815e and fbbde44.

📒 Files selected for processing (2)
  • .agentworkforce/trajectories/active/traj_b1jrutolckfb/trajectory.json
  • tests/integration/broker/infra-failures.test.ts
✅ Files skipped from review due to trivial changes (1)
  • .agentworkforce/trajectories/active/traj_b1jrutolckfb/trajectory.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/integration/broker/infra-failures.test.ts

📝 Walkthrough

Walkthrough

This PR updates many integration tests to import harness-driver types, simplifies BrokerHarness to use only the low-level HarnessDriverClient (removing the AgentRelay facade), and adds a new infra-failures integration test suite covering crash recovery, graceful shutdown persistence, queue overflow eviction, and unverified delivery reporting.

Changes

Test Harness Migration and Infrastructure Tests

Layer / File(s) Summary
Import refactoring across test files
tests/integration/broker/agent-spawns-agent.test.ts, tests/integration/broker/channel-management.test.ts, tests/integration/broker/cli-spawn.test.ts, tests/integration/broker/edge-cases.test.ts, tests/integration/broker/events.test.ts, tests/integration/broker/functionality.test.ts, tests/integration/broker/lifecycle.test.ts, tests/integration/broker/mcp-hints.test.ts, tests/integration/broker/mcp-injection.test.ts, tests/integration/broker/observability.test.ts, tests/integration/broker/pty-exit.test.ts, tests/integration/broker/stress.test.ts, tests/integration/broker/tic-tac-toe.test.ts, tests/integration/broker/utils/assert-helpers.ts
BrokerEvent, HarnessDriverProtocolError, HarnessDriverClient, and related test types now import from @agent-relay/harness-driver instead of @agent-relay/sdk; RelayCast import adjusted where applicable.
Broker harness to low-level client only
tests/integration/broker/utils/broker-harness.ts
BrokerHarness no longer creates/exposes the high-level AgentRelay facade; it initializes and shuts down only the low-level HarnessDriverClient, updates docs/comments, and moves RelayCast to @relaycast/sdk.
Infrastructure failure integration tests
tests/integration/broker/infra-failures.test.ts
New test suite with helpers and four active scenarios: SIGKILL crash recovery verifying persisted pending deliveries reload, graceful-shutdown persistence for undelivered messages, per-worker pending queue overflow observability and eviction checks, and unverified delivery reporting (timeout_fallback vs echo-verified). Includes prerequisite skipping, state-dir helpers, PTY non-echoing sink, and polling utilities.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • AgentWorkforce/relay#1073: Related integration test import changes aligning BrokerEvent types with harness-driver protocol updates.
  • AgentWorkforce/relay#1062: Related change switching BrokerEvent imports to harness-driver, matching additions to event variants in harness-driver.

Suggested reviewers

  • khaliqgant

Poem

I’m a rabbit in the harness, nibbling through types,
I hop past facades, simplify the plights.
When brokers tumble or queues overflow,
I cheer the tests that help systems grow. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding integration test coverage for broker infrastructure-failure delivery scenarios.
Description check ✅ Passed The description is comprehensive and well-structured with clear sections (Summary, Test Plan checklist, Scenarios shipped, Test results, and Possible real bugs). It provides detailed context about both commits, the scenarios being tested, and test results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/broker-infra-failures

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed due to a network error.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the integration tests by migrating imports from @agent-relay/sdk to @agent-relay/harness-driver and @relaycast/sdk, and removing the high-level AgentRelay facade from the BrokerHarness to rely solely on the low-level HarnessDriverClient. Additionally, it introduces a new test suite infra-failures.test.ts to cover broker infrastructure failures such as crash recovery, graceful shutdown persistence, queue overflow, and unverified delivery timeouts. Feedback on the new test suite suggests wrapping the file-reading helper in a try-catch block to prevent flakiness during polling, and defensively creating the state directory to avoid potential persistence failures.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +69 to +72
function readPendingFile(filePath: string): PersistedPendingDelivery[] | null {
if (!fs.existsSync(filePath)) return null;
return JSON.parse(fs.readFileSync(filePath, 'utf-8')) as PersistedPendingDelivery[];
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Reading and parsing the pending deliveries file during polling can be flaky. Since the broker writes to this file concurrently/periodically, fs.readFileSync or JSON.parse can throw an error (e.g., SyntaxError due to a partially written file or ENOENT/locking issues). Wrapping this in a try-catch block and returning null on failure allows the polling mechanism (pollUntil) to safely retry instead of crashing the test.

function readPendingFile(filePath: string): PersistedPendingDelivery[] | null {
  if (!fs.existsSync(filePath)) return null;
  try {
    return JSON.parse(fs.readFileSync(filePath, 'utf-8')) as PersistedPendingDelivery[];
  } catch {
    return null;
  }
}

Comment on lines +112 to +116
function makeTempDirs(prefix: string): { cwd: string; stateDir: string } {
const cwd = fs.mkdtempSync(path.join(os.tmpdir(), prefix));
const stateDir = path.join(cwd, 'state');
return { cwd, stateDir };
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Ensure that the stateDir directory is explicitly created. If the broker expects the directory to exist and does not create it automatically, file persistence operations might fail with a directory not found error. Creating it defensively here prevents potential failures.

Suggested change
function makeTempDirs(prefix: string): { cwd: string; stateDir: string } {
const cwd = fs.mkdtempSync(path.join(os.tmpdir(), prefix));
const stateDir = path.join(cwd, 'state');
return { cwd, stateDir };
}
function makeTempDirs(prefix: string): { cwd: string; stateDir: string } {
const cwd = fs.mkdtempSync(path.join(os.tmpdir(), prefix));
const stateDir = path.join(cwd, 'state');
fs.mkdirSync(stateDir, { recursive: true });
return { cwd, stateDir };
}

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d6815ecfb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// Clean shutdown while both deliveries are still awaiting verification.
// Regression test for the old behavior where shutdown cleared the
// pending map before persisting, losing undelivered messages.
await harness.stop();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop racing the verification timeout before shutdown

When CI takes more than the broker's 5s echo-verification window between the first delivery_injected event and this shutdown, the sink delivery is acked via timeout_fallback and removed from the pending map before stop() persists it. In that case the assertion below intermittently sees only one or zero entries instead of both eventIds, so this new durability test can fail due to scheduling/load rather than a broker regression.

Useful? React with 👍 / 👎.

@willwashburn willwashburn merged commit fe2f9f6 into main Jun 12, 2026
2 checks passed
@willwashburn willwashburn deleted the test/broker-infra-failures branch June 12, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant