test(broker): add infrastructure-failure delivery coverage#1100
Conversation
The broker integration suite imported HarnessDriverClient, BrokerEvent, ListAgent, SendMessageInput, and protocol error types from @agent-relay/sdk, which no longer exports them after the SDK narrowing; they live in @agent-relay/harness-driver (and RelayCast in @relaycast/sdk). The suite has not compiled since. Update the imports and drop the harness's unused AgentRelay facade (no test used harness.relay, and the SDK's AgentRelay is now the workspace client, not a broker facade). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add integration coverage for delivery durability and observability when the infrastructure fails, exercising the durable-delivery broker work: - crash recovery: in-flight deliveries survive a broker SIGKILL in the persisted pending snapshot, reload on restart with the same state dir, and get retried; already-acked deliveries are not redelivered - graceful shutdown: undelivered pending deliveries survive a clean shutdown (regression test for the old clear-before-persist behavior) - queue overflow: bursting past MAX_PENDING_PER_WORKER emits one delivery_dropped event per eviction, drops the oldest messages, and trips assertNoDroppedDeliveries - unverified delivery: a recipient that swallows echo output yields delivery_verified with verification "timeout_fallback" instead of a verified success Upstream Relaycast connection-gap behavior is left as an explicit skip: the suite provisions a real hosted workspace, so there is no local endpoint to sever deterministically. Tests use a no-echo PTY sink fixture (stty -echo + stdin sink) and event-driven waits/polling instead of fixed sleeps. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR updates many integration tests to import harness-driver types, simplifies BrokerHarness to use only the low-level HarnessDriverClient (removing the AgentRelay facade), and adds a new infra-failures integration test suite covering crash recovery, graceful shutdown persistence, queue overflow eviction, and unverified delivery reporting. ChangesTest Harness Migration and Infrastructure Tests
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed due to a network error. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors the integration tests by migrating imports from @agent-relay/sdk to @agent-relay/harness-driver and @relaycast/sdk, and removing the high-level AgentRelay facade from the BrokerHarness to rely solely on the low-level HarnessDriverClient. Additionally, it introduces a new test suite infra-failures.test.ts to cover broker infrastructure failures such as crash recovery, graceful shutdown persistence, queue overflow, and unverified delivery timeouts. Feedback on the new test suite suggests wrapping the file-reading helper in a try-catch block to prevent flakiness during polling, and defensively creating the state directory to avoid potential persistence failures.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| function readPendingFile(filePath: string): PersistedPendingDelivery[] | null { | ||
| if (!fs.existsSync(filePath)) return null; | ||
| return JSON.parse(fs.readFileSync(filePath, 'utf-8')) as PersistedPendingDelivery[]; | ||
| } |
There was a problem hiding this comment.
Reading and parsing the pending deliveries file during polling can be flaky. Since the broker writes to this file concurrently/periodically, fs.readFileSync or JSON.parse can throw an error (e.g., SyntaxError due to a partially written file or ENOENT/locking issues). Wrapping this in a try-catch block and returning null on failure allows the polling mechanism (pollUntil) to safely retry instead of crashing the test.
function readPendingFile(filePath: string): PersistedPendingDelivery[] | null {
if (!fs.existsSync(filePath)) return null;
try {
return JSON.parse(fs.readFileSync(filePath, 'utf-8')) as PersistedPendingDelivery[];
} catch {
return null;
}
}| function makeTempDirs(prefix: string): { cwd: string; stateDir: string } { | ||
| const cwd = fs.mkdtempSync(path.join(os.tmpdir(), prefix)); | ||
| const stateDir = path.join(cwd, 'state'); | ||
| return { cwd, stateDir }; | ||
| } |
There was a problem hiding this comment.
Ensure that the stateDir directory is explicitly created. If the broker expects the directory to exist and does not create it automatically, file persistence operations might fail with a directory not found error. Creating it defensively here prevents potential failures.
| function makeTempDirs(prefix: string): { cwd: string; stateDir: string } { | |
| const cwd = fs.mkdtempSync(path.join(os.tmpdir(), prefix)); | |
| const stateDir = path.join(cwd, 'state'); | |
| return { cwd, stateDir }; | |
| } | |
| function makeTempDirs(prefix: string): { cwd: string; stateDir: string } { | |
| const cwd = fs.mkdtempSync(path.join(os.tmpdir(), prefix)); | |
| const stateDir = path.join(cwd, 'state'); | |
| fs.mkdirSync(stateDir, { recursive: true }); | |
| return { cwd, stateDir }; | |
| } |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d6815ecfb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Clean shutdown while both deliveries are still awaiting verification. | ||
| // Regression test for the old behavior where shutdown cleared the | ||
| // pending map before persisting, losing undelivered messages. | ||
| await harness.stop(); |
There was a problem hiding this comment.
Stop racing the verification timeout before shutdown
When CI takes more than the broker's 5s echo-verification window between the first delivery_injected event and this shutdown, the sink delivery is acked via timeout_fallback and removed from the pending map before stop() persists it. In that case the assertion below intermittently sees only one or zero entries instead of both eventIds, so this new durability test can fail due to scheduling/load rather than a broker regression.
Useful? React with 👍 / 👎.
What
Adds integration coverage for delivery behavior when the infrastructure fails (broker crash, shutdown, queue overflow, unverifiable injection), exercising the durable-delivery work from #1073. Test-only — no broker changes.
Two commits:
test(broker): repoint integration suite at split SDK packages— prerequisite. The whole broker suite has not compiled since the SDK narrowing (HarnessDriverClient/BrokerEvent/ListAgent/SendMessageInput/HarnessDriverProtocolErrormoved to@agent-relay/harness-driver,RelayCastto@relaycast/sdk). Imports are repointed and the harness's unusedAgentRelayfacade is dropped (no test usedharness.relay; the SDK'sAgentRelayis now the workspace client, not a broker facade).test(broker): cover infrastructure-failure delivery scenarios— newtests/integration/broker/infra-failures.test.ts.Scenarios shipped
--persist+ SIGKILL + restart, same state dir)pending-<broker>.json; an already-acked delivery is not persisted (dedup); the restarted broker reloads the entries and retries them (observed asmessage_delivery_failed"recipient gone" for the exact persistedevent_ids, since the sink died with the broker); no event on the restarted broker references the acked deliverymanual_flushworker emits onedelivery_droppedper eviction withcount: 1and apending queue full (max 256)reason naming the evicted sender; the evicted messages are the 4 oldest; the surviving queue is exactly the cap with the right membership;assertNoDroppedDeliveriestrips on this scenariostty -echo+ stdin sink) yieldsdelivery_verifiedwithverification: "timeout_fallback"and an "echo not detected" reason — never an echo-verified success — while the delivery is still acked exactly onceSkipped (explicit
skipstub in the file): WS-gap behavior. The suite provisions a real hosted Relaycast workspace (ensureApiKey→RelayCast.createWorkspace) and the broker dials it directly; there is no fake/local endpoint to sever and restore deterministically. Worth adding once the suite grows a local Relaycast stub.New tests use event-driven waits (
waitForEventwithevent_idpredicates) and bounded polling, no fixed sleeps. They need onlysh/cat, so they run in the default suite (noRELAY_INTEGRATION_REAL_CLIgating).Test results
New file (
node --test dist/infra-failures.test.js): 4 pass / 1 skip, three consecutive standalone runs plus once inside the full suite — no flakes observed. (~21s per run.)Full suite (
node --test --test-concurrency=1 dist/*.test.js): 31 pass / 24 fail / 59 skip. All 24 failures are pre-existing drift that accumulated while the suite was uncompilable — none are caused by this PR, and all reproduce in isolation:KIND: continuityblocks are saved.continueFromcontext injection is broken end-to-end. Possible real regression (see below).broker-<name>.pidfiles; the broker now records its PID inconnection.jsononly. Tests are stale. (The lockfile test file also hangs the runner after failing because spawned brokers are never reaped — needed a manual kill during the suite run.)swarm --dry-runandworkflows listmoved out with the Relayflows split; tests are stale./api/sendreturns{success, event_id}butHarnessDriverClient.sendMessage(and the test) expect atargetsarray. SDK/broker response contract mismatch.agent_not_found.agent_exitedis not observed within 30s of a short-lived PTY child exiting.worker_readynot observed within 10s for acatagent.Possible real bugs surfaced (not fixed here, per scope)
ListenApiRequest::Releasenever writes the continuity record thecontinueFromspawn path reads, so cross-session continuity only works if the agent itself emitted aKIND: continuityblock. If release-time continuity is still intended behavior, this is a broker regression; if not, the continuity tests need rewriting.sendMessageresponse contract:@agent-relay/harness-drivertypes promisetargets: string[]from/api/send; the broker never returns it.Notes
CHANGELOG.mdentry: the root changelog is the user-facing release narrative and does not carry test-only changes..agentworkforce/trajectories/active/already contains an active trajectory from fix(broker): make delivery handling durable and observable #1073, andtrail startrefuses to stack a second one; I didn't want to complete/abandon someone else's record.🤖 Generated with Claude Code