feat(engine): bounded-durable mailbox, location routing, §5 cleanup (fleet Phase 2)#193
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Review limit reached
More reviews will be available in 11 minutes and 34 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (30)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
willwashburn
left a comment
There was a problem hiding this comment.
Verified npm run build and npm run test pass after installing workspace deps.
-
major packages/engine/src/routes/deliveryRouting.ts:46 — routing for
via_nodedeliveries is based on thelocation_type/location_node_idsnapshot stored on the delivery row, not the recipient’s current location. If an agent re-registers on a different transport before fanout runs, the message still goes to the stale node/socket and can be lost. Re-resolve the current agent location at send time, or migrate queued deliveries when location changes. -
major packages/engine/src/adapters/node/realtime.ts:311 —
attachNodeSocket()adds sockets to aSetand never replaces an existing control connection. A reconnect can therefore leave two live node sockets for the same node, duplicatingdeliver/action.invoketraffic and any acks. Enforce a single active node socket by closing the prior one before attaching the new connection.
|
Addressed the fanout race in : delivery fanout now re-resolves the recipient's live location from before push, and the mailbox suite now has a regression test that flips the recipient from to before dispatch to prove the delivery still lands. I left the node socket supersede/close change to Fleet1 as requested and will rebase on that tip when it lands. |
|
Addressed the fanout race in deliveryRouting.ts: delivery fanout now re-resolves the recipient's live location from the agents table before push, and the mailbox suite now has a regression test that flips the recipient from via_node to self_connected before dispatch to prove the delivery still lands. I left the node socket supersede and close change to Fleet1 as requested and will rebase on that tip when it lands. |
willwashburn
left a comment
There was a problem hiding this comment.
Round 2 review against feat/fleet-nodes-engine. Verdict: NO-GO. GitHub would not allow this authenticated reviewer to request changes on its own pull request, so I am posting the blocking review as a comment.
-
[major] TTL expiry can overwrite a concurrent ack and emit a false delivery_failed.
expireDueDeliveriesselects active rows at packages/engine/src/engine/delivery.ts:398-411, then updates the captured ids at packages/engine/src/engine/delivery.ts:416-425 without re-checking that status is still queued/delivered. If a nodedelivery.acklands between those awaits,ackRowscan set the row toacked, but the expiry update will still change it todead_letteredand notify the sender. This violates queued -> delivered -> acked terminal success and the round-2 concurrent ack/TTL edge. Make the expiry update status-guarded and build notices only from rows actually transitioned, ideally in one transaction/returning update. -
[major] Reconnect redelivery does not preserve the live deliver payload and reports the recipient as the message author. Live fanout sends
payload: { type, data }fromrouteDeliveryOutcomesat packages/engine/src/routes/deliveryRouting.ts:76-85. Reconnect replay rebuilds a differentpayloadindeliverPendingToNodeat packages/engine/src/engine/delivery.ts:462-485 and sends it at packages/engine/src/engine/delivery.ts:523-531. That reconstructed payload also setsmessage.agent_idfromrow.delivery.agentIdat packages/engine/src/engine/delivery.ts:477-480, which is the target agent, not the sender. After broker death, an unacked message with the samemsg_id/seqcan therefore be reinjected with a different shape and wrong author id. Persist or reconstruct one canonical deliver payload for both live and replay paths, using the original sender id/name, and add a reconnect test that asserts payload equality, not onlymsg_idandseq.
Tests run:
npm cinpm run buildnpm run typecheck --workspace=@relaycast/enginenpx vitest run src/__tests__/conformance/delivery.test.ts src/__tests__/conformance/node.test.ts src/__tests__/atomicity.test.tsfrompackages/enginenpm run test --workspace=@relaycast/typesnpm run test --workspace=@relaycast/sdknpm run test --workspace=@relaycast/enginenpm run test
|
Round 2 review items addressed in ea89632:
Fleet1-owned single-socket-per-node work remains untouched here as requested. |
willwashburn
left a comment
There was a problem hiding this comment.
Round 3 review: NO-GO. GitHub rejected --request-changes for this account with: Review Can not request changes on your own pull request, so this is posted as a review comment with the same blocking findings.
-
[blocker] PR 193 is not rebased on the PR 192 head fix and reintroduces multi-socket node delivery.
git merge-base --is-ancestor origin/feat/fleet-nodes-engine HEADis false forea89632vs8949f842, and the current code storesnodeSocketsas aSetatpackages/engine/src/adapters/node/realtime.ts:66, then sends every control frame to every socket atpackages/engine/src/adapters/node/realtime.ts:244, whileattachNodeSocketonly adds the new socket atpackages/engine/src/adapters/node/realtime.ts:311. A reconnect with an old socket still present can duplicatedeliverandaction.invokeframes, which is exactly the single-socket class of bug round 3 asked us to verify. Rebase or mergefeat/fleet-nodes-enginetip, keep one current socket per node, close the superseded socket, and keep the reconnect/no-duplicate test on this stack. -
[major] Broker-death replay does not preserve the live delivery wire frame for non-channel messages. Live DM, group DM, and thread deliveries call
routeDeliveryOutcomeswithdm.received,group_dm.received, andthread.replyatpackages/engine/src/routes/dm.ts:131,packages/engine/src/routes/groupDm.ts:185, andpackages/engine/src/routes/thread.ts:126, butdeliverPendingToNodealways rebuilds the pending row asmessage.createdatpackages/engine/src/engine/delivery.ts:562. After reconnect, the same delivery can changepayload.typeand data shape, breaking consumers and fixture parity. Persist the original delivery event type/payload or reconstruct through the same builders used by the live fanout path, and add reconnect redelivery tests for DM, group DM, and thread reply deliveries. -
[major] A future cumulative ack can permanently suppress later replay.
ackRowsadvancesagents.deliveryAckSeqdirectly to anyup_to_seqatpackages/engine/src/engine/delivery.ts:313, while new delivery seqs are assigned from onlyMAX(deliveries.seq)+1atpackages/engine/src/engine/deliveryWrites.ts:66, and replay filters rows withdeliveries.seq > agents.deliveryAckSeqatpackages/engine/src/engine/delivery.ts:550. I reproduced this by sendingdelivery.ack {up_to_seq:100}before the first message: the live frame usedseq:1, then after node reconnect/inventory syncreplayDeliverieswas0. Do not advance the ack cursor past rows that actually exist and were acked, or make seq allocation account for the ack cursor; add a regression for future/stale acks before first delivery. -
[major] The new TTL-vs-ack race coverage is flaky under fresh execution. A focused run of
npx vitest run src/__tests__/conformance/delivery.test.ts src/__tests__/conformance/node.test.ts src/__tests__/atomicity.test.tsfailed atpackages/engine/src/__tests__/conformance/delivery.test.ts:706becauseexpireDueDeliveriesreturned one notice indoes not dead-letter an acked delivery when ack and TTL expiry race. A later forced fullnpm run test -- --forcepassed, which points to nondeterministic scheduling in the test or implementation. Make this deterministic: either enforce the intended ack-wins ordering in storage, or adjust the test to assert exactly one terminal winner and no duplicate sender fanout.
Verification run from /tmp/review-Rev2x3:
npm cinpm run build --workspace=@relaycast/typesandnpm run build --workspace=@relaycast/a2ato create workspacedist/outputs for fresh typechecknpm run typecheck --workspace=@relaycast/enginepassed- focused engine vitest command above failed once on the TTL race
npm run test --workspace=@relaycast/typespassednpm run test --workspace=@relaycast/sdkpassednpm run buildpassednpm run test -- --forcepassed with 18/18 tasks forced, 0 cached
ea89632 to
c39fc79
Compare
willwashburn
left a comment
There was a problem hiding this comment.
Verdict: NO-GO. GitHub would not allow this account to request changes on its own PR, but the finding below should block merge until fixed.
-
[major] openapi.yaml:3815 Duplicate path keys make the OpenAPI document invalid. This PR adds detailed
/nodes,/nodes/{name},/triggers, and/triggers/{id}specs at openapi.yaml:2848, but leaves the old stub definitions starting at openapi.yaml:3815. YAML maps cannot contain duplicate keys; runningnode -e "const fs=require("fs"); const YAML=require("yaml"); YAML.parse(fs.readFileSync("openapi.yaml","utf8"));"fails withMap keys must be unique at line 3815. Generators/parsers cannot reliably consume the API spec, and less-strict parsers will shadow the detailed node/trigger request schemas/security with the later generic stubs. Remove the old stub path blocks or merge their content into the new detailed definitions so each path appears exactly once. -
[nit] memory/workspace/.relay/state.json:1 Generated relay sync state changed only timestamps/counters. Drop this file from the PR or ignore generated state so the fleet mailbox change does not carry local runtime metadata.
Verification run in /tmp/review-Rev2x4:
git merge-base --is-ancestor 8949f842 HEADpassed at c39fc79.npm installcompleted; initial rootnpm run build/npm run testpassed via Turbo cache.- Direct runs passed:
npm run build --workspace=@relaycast/engine,npm run typecheck --workspace=@relaycast/engine,npx vitest run src/__tests__/conformance/delivery.test.ts src/__tests__/conformance/node.test.ts src/__tests__/atomicity.test.tsfrom packages/engine, andnpm run test --workspace=@relaycast/types. - Round-4 target checks: replay/redelivery now preserves original payload event types via the shared builder and equality regressions; TTL expiry update is status-guarded; fanout re-resolves current recipient location before routing.
c39fc79 to
ef058d0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef058d0af2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const onlineAgentIds = await deps.presence.getOnline(workspaceId); | ||
| if (onlineAgentIds.length > 0) { | ||
| fanoutTasks.push(deps.realtime.deliverToAgents({ workspaceId, agentIds: onlineAgentIds, event: payload })); |
There was a problem hiding this comment.
Don't broadcast action results to every online agent
When an action completes with private output or error data, this presence fanout sends the transformed action.completed/action.failed payload to every online agent in the workspace. The existing route only targeted the invoking caller_id plus the workspace stream, so unrelated connected agents now receive another agent's action result; remove this presence-wide delivery or gate it behind the workspace-stream authorization path.
Useful? React with 👍 / 👎.
| const updateTriggerBodySchema = triggerBodySchema.partial(); | ||
|
|
||
| // POST /v1/triggers | ||
| triggerRoutes.post('/triggers', requireAuth, rateLimit, async (c) => { |
There was a problem hiding this comment.
Restrict trigger creation to trusted principals
Because this endpoint accepts any agent token, any agent can create workspace-wide triggers; those triggers later invoke actions using the future message author's caller_id/caller_name, so an agent can set up a trigger for an action only available to another agent and have it run when that agent posts matching text. Require a workspace key for trigger management or store and enforce the trigger creator's permissions when firing.
Useful? React with 👍 / 👎.
| export interface NodeRosterEntry { | ||
| id: string; | ||
| name: string; | ||
| capabilities: string[]; |
There was a problem hiding this comment.
Model node capabilities as objects in the SDK
The engine stores and returns node capabilities as FleetCapability objects after enrollment/registration, but the SDK declares them as string[]. SDK consumers will compile code such as capabilities.includes('echo') that silently fails against the actual { name: 'echo' } objects; either expose the object shape here or map the response to strings before returning it.
Useful? React with 👍 / 👎.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
489a0db to
6470e3f
Compare
#192 (merged) owns 0017_spawn_reservation_and_retry_state and 0018; the Phase 2 mailbox migration was authored as 0017 on an older base, colliding on the 0017 prefix once #192/#193 landed in main. Renumber to 0019 (after 0018) so the D1 migration sequence is unique and ordered. Pure file rename — no code references the filename, and the migration has not been applied to any environment yet (engine unpublished), so there is no D1 re-apply risk.
…ivery guarantees (Phase 6) (#194) * feat(engine): per-workspace fleet rollout flag + migration single-delivery guarantees (Phase 6) Gate the entire fleet node control surface behind a per-workspace `fleet_nodes_enabled` flag (default OFF), so fleet can ship dark and roll out workspace-by-workspace. Legacy per-agent WS delivery is unaffected either way. The flag is checked once at each genuine boundary (no scattered checks): - node control WS (`/v1/node/ws`) rejects with `fleet_nodes_disabled` (404) - node roster routes (`/v1/nodes*`) return a flat 404 via `requireFleetNodes` - declarative trigger evaluation is skipped at the message hook - spawn placement + node-handler dispatch refuse in `invokeAction` (agent-handler actions stay available) Flag source mirrors the workspace-stream pattern: a KV override with a short in-memory cache, defaulting to `EngineConfig.fleetNodesEnabled`. GET/PUT `/v1/workspace/fleet-nodes` toggles the per-workspace override. Tests: - flag OFF -> every node surface inert (roster, spawn, WS gate, triggers) - per-workspace override flips the surface on/off; WS gate follows the flag - migration single-delivery: a legacy self-connected agent is never also delivered via a node when the flag flips mid-stream (exclusive location; a node's `agent.register` for it is rejected `agent_location_conflict`) The conformance harness defaults the flag ON, so existing node integration tests pass unchanged. Full engine suite green (108 tests). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(engine): accept node token via Authorization: Bearer + gate node WS upgrade behind fleet flag (Phase 6) Cross-repo compat fix surfaced by the Phase 6 two-node E2E: a real relay broker could never bring a node online against a self-hosted engine. Root cause: the node-control read-side is the Node HTTP-server `upgrade` handler in `entrypoints/node.ts` (the Hono `/v1/node/ws` route only answers the 426 — Node owns the 101). That handler read the token ONLY from the `?token=` query param, but the relay Rust broker's node_control client sends it as `Authorization: Bearer <nt_live_…>`. It also had NO fleet-flag gate for `/v1/node/ws` (only the rk_live workspace-stream path was gated), so the Phase 6 rollout flag did not actually cover the node control surface on the self-host adapter. Fix, both in the upgrade handler: - read the node token from `?token=` query OR `Authorization: Bearer` header (query stays for SDK/Pear; header unblocks the shipped broker — no Rust release needed) - gate the `/v1/node/ws` upgrade behind `isFleetNodesEnabled` (404 when off), mirroring the existing stream gate Also mirrored the dual-transport read in the Hono `/v1/node/ws` route for any adapter that routes upgrades through it. Accepted-stack PRs involved: engine read-side #192, broker send-side #1107. The hosted (Cloudflare DO) equivalent is handled in PR 5. Test: `nodeUpgradeAuth.test.ts` boots the real Node server and asserts a WS client authenticates via BOTH the Bearer header and the query param, that the upgrade is rejected while the workspace flag is off (404), and that a missing/malformed token is rejected (401). Full engine suite green (111). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(engine): reply to agent.register with a broker-shaped `reply` frame (Phase 6 token authority) Third cross-repo compat fix surfaced by the Phase 6 E2E (spawn scenarios): spawn never completes end-to-end against a real broker. The relay broker's node_control client awaits a `reply` frame keyed by the request id — it matches `pending_agent_registrations` by `reply.id` and parses `data` as `{agent_id, token, name}` with `deny_unknown_fields`. The engine instead answered `agent.register` with a bare `{type:'agent.registered', ...}` carrying the full object (incl. invocation_id/session_ref), which the broker never matches → `register_fleet_agent_token` hangs to its 30s timeout → the spawn action fails. This blocked every spawn-dependent path (placement completion, mailbox delivery to via-node agents, resume). Reply in the shape the shipped broker consumes; the broker already holds the invocation_id/session_ref it sent, so only the minted identity is echoed. Same root pattern as the node-token transport mismatch (#192 read-side ↔ #1107 broker send-side); no Rust release needed. Updated the one conformance helper that asserted the old frame. Engine suite green (111). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(engine): self-host serve env for fleet flag default + mailbox TTL/depth-cap The `relaycast-engine` serve bin gains optional env tuning so operators (and the Phase 6 fleet E2E) can configure the bounded mailbox and the fleet rollout default without code changes: - RELAYCAST_FLEET_NODES_ENABLED=1 → EngineConfig.fleetNodesEnabled - RELAYCAST_MAILBOX_TTL_MS / RELAYCAST_MAILBOX_DEPTH_CAP → mailbox tuning Unset env leaves the existing defaults untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore(engine): renumber fleet mailbox migration 0017→0019 to deduplicate #192 (merged) owns 0017_spawn_reservation_and_retry_state and 0018; the Phase 2 mailbox migration was authored as 0017 on an older base, colliding on the 0017 prefix once #192/#193 landed in main. Renumber to 0019 (after 0018) so the D1 migration sequence is unique and ordered. Pure file rename — no code references the filename, and the migration has not been applied to any environment yet (engine unpublished), so there is no D1 re-apply risk. * docs(changelog): record fleet node/mailbox changes + breaking DeliveryStatus remap Changelogs here are hand-curated (no CI generation), and the fleet stack (#191-#194) was missing from them. Add the user-facing entries: - @relaycast/types: new CHANGELOG; document the breaking DeliveryStatus enum remap (accepted/deferred removed, acked/dead_lettered added, delivered re-meaning) with old->new mapping + flag-independent migration note, the new Delivery location/lifecycle fields, and the fleet-wire protocol module. - @relaycast/sdk-typescript: node roster API (nodes.list/get, triggers.list), capability objects, handler/dispatch node fields, JsonValue export, and the breaking action-output widening + delivery status value change. These confirm the next @relaycast/types + sdk-typescript publish is a MAJOR. * chore: apply pr-reviewer fixes for #194 * fix(engine): mailbox cumulative-ack + depth-cap correctness (Codex review) Address P2 findings from Codex review of the fleet mailbox delivery path: 1. ackDelivery (single per-delivery REST ack) advanced the cumulative cursor to the row's own seq, so acking seq 2 while seq 1 is queued moved the cursor past seq 1; deliverPendingToNode (seq > delivery_ack_seq) then skipped it forever on node replay. Make the cursor advance opt-in (ackRows advanceCursorTo?) — single acks no longer advance it; the row's acked status already excludes it from replay. The node delivery.ack {up_to_seq} path still advances cumulatively. Regression test. 2. Migration 0019 seeded delivery_ack_seq = MAX(acked seq), skipping an older still-queued row below a newer acked one. Seed from the contiguous acked prefix (lowest active seq - 1; max seq when nothing is active). 3. Node-replay event classification checked dmType before threadId, so a thread reply inside a DM/group DM would replay as dm.received instead of thread.reply (the live routes/thread.ts routing). Check threadId first to mirror live. 4. Mailbox depth-cap count included expired-but-unswept rows, so an idle recipient kept rejecting new sends as depth_cap after TTL instead of dead-lettering. Exclude expired rows from the count (matches the replay query). Regression test. Also classify the operator-only /v1/workspace/fleet-nodes flag route as non-SDK in sdk-openapi-sync (pre-existing #194 gap that turbo test caching had masked). * docs(openapi): require enabled|mode and document 400 on PUT /workspace/fleet-nodes The PUT handler rejects a payload lacking both `enabled` (boolean) and `mode: inherit` with a 400 invalid_request, but the schema marked both optional and documented only a 200. Add anyOf[required: enabled | required: mode] to reflect the runtime constraint, and document the 400 (ErrorResponse). --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: agent-relay-code[bot] <agent-relay-code[bot]@users.noreply.github.com>
Rebased feature/engine-retention onto main, which now has the fleet stack (#191-#194). Three adaptations: 1. Renumber migration 0016_workspace_retention -> 0020_workspace_retention; #192 took 0016 (fleet_nodes) and #193 took 0019 (fleet_mailbox). 2. Delivery status model: #193 reworked the enum, so SETTLED_DELIVERY_STATUSES is now ['acked','failed','dead_lettered'] (was ['delivered','failed']). 'delivered' is now IN-FLIGHT (sent, awaiting cumulative ack), so retention must never prune it; 'acked' is terminal success. Updated tests to the new status names. 3. insertDelivery test helper assigns a distinct seq per agent — the mailbox migration added UNIQUE(workspace_id, agent_id, seq), so same-agent rows can no longer share the default seq 0. Note: turbo build/tsc is currently red on main itself (engine.ts:212 uses originInfo.origin_surface, which #188 removed from the telemetry contract) — a pre-existing #188/#192 collision unrelated to this PR. Engine vitest is green (132/132).
Rebased feature/engine-retention onto main, which now has the fleet stack (#191-#194). Three adaptations: 1. Renumber migration 0016_workspace_retention -> 0020_workspace_retention; #192 took 0016 (fleet_nodes) and #193 took 0019 (fleet_mailbox). 2. Delivery status model: #193 reworked the enum, so SETTLED_DELIVERY_STATUSES is now ['acked','failed','dead_lettered'] (was ['delivered','failed']). 'delivered' is now IN-FLIGHT (sent, awaiting cumulative ack), so retention must never prune it; 'acked' is terminal success. Updated tests to the new status names. 3. insertDelivery test helper assigns a distinct seq per agent — the mailbox migration added UNIQUE(workspace_id, agent_id, seq), so same-agent rows can no longer share the default seq 0. Note: turbo build/tsc is currently red on main itself (engine.ts:212 uses originInfo.origin_surface, which #188 removed from the telemetry contract) — a pre-existing #188/#192 collision unrelated to this PR. Engine vitest is green (132/132).
…llow-ups (#189) * feat(engine): retention pruning with per-workspace TTLs and outbox follow-ups Add pruneExpired: bounded-batch deletion of expired messages (leaf-first across thread parents), settled deliveries, message logs, and orphaned read receipts, with per-workspace TTLs in a new nullable workspaces.retention column. Message retention is opt-in; settled deliveries and message logs default to 90 days as operational logs. Runs on the Node adapter's outbox cleanup cadence and is exported for queue-backed scheduled handlers. cleanupOldEvents now settles exhausted pending_events rows as failed so they become prunable instead of lingering unclaimable, and sendWebhookEvent skips the outbox insert and queue send entirely (with a per-request memoized existence probe) for workspaces with no active event subscription. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * chore(engine): adapt retention to merged fleet engine (rebase onto main) Rebased feature/engine-retention onto main, which now has the fleet stack (#191-#194). Three adaptations: 1. Renumber migration 0016_workspace_retention -> 0020_workspace_retention; #192 took 0016 (fleet_nodes) and #193 took 0019 (fleet_mailbox). 2. Delivery status model: #193 reworked the enum, so SETTLED_DELIVERY_STATUSES is now ['acked','failed','dead_lettered'] (was ['delivered','failed']). 'delivered' is now IN-FLIGHT (sent, awaiting cumulative ack), so retention must never prune it; 'acked' is terminal success. Updated tests to the new status names. 3. insertDelivery test helper assigns a distinct seq per agent — the mailbox migration added UNIQUE(workspace_id, agent_id, seq), so same-agent rows can no longer share the default seq 0. Note: turbo build/tsc is currently red on main itself (engine.ts:212 uses originInfo.origin_surface, which #188 removed from the telemetry contract) — a pre-existing #188/#192 collision unrelated to this PR. Engine vitest is green (132/132). --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Rebased #190 onto main (resolving the #188 origin-contract changes: origin_surface is gone; only origin_actor + origin_client/origin_version remain — confirmed no origin_surface references survive). Correctness fixes layered on top of #190's parity additions: - DeliveryStatus: updated the stale Literal["accepted","delivered", "deferred","failed"] to the canonical #193 enum Literal["queued","delivered","acked","failed","dead_lettered"] (packages/types/src/delivery.ts). "delivered" now means in-flight awaiting ack; "acked" is terminal success; accepted/deferred removed. - Delivery model: aligned with the canonical DeliverySchema by adding the missing fields seq, location_type, location_node_id, expires_at, delivered_at, acked_at, dead_lettered_at to match the TS SDK surface. - channels.set_topic: corrected the route from PATCH /v1/channels/{name} to PATCH /v1/channels/{name}/topic to match the TS setTopic() and the dedicated openapi endpoint (it was colliding with channels.update). - channels.invite: corrected the request body field from {"agent": ...} to {"agent_name": ...} to match InviteRequestSchema / the TS SDK wire shape (Python sends keys verbatim with no camel->snake conversion). - Updated test_channels_set_topic to assert the corrected /topic route. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
) * Add Python SDK parity endpoints * Fix Python SDK parity: delivery status enum, channel topic/invite paths Rebased #190 onto main (resolving the #188 origin-contract changes: origin_surface is gone; only origin_actor + origin_client/origin_version remain — confirmed no origin_surface references survive). Correctness fixes layered on top of #190's parity additions: - DeliveryStatus: updated the stale Literal["accepted","delivered", "deferred","failed"] to the canonical #193 enum Literal["queued","delivered","acked","failed","dead_lettered"] (packages/types/src/delivery.ts). "delivered" now means in-flight awaiting ack; "acked" is terminal success; accepted/deferred removed. - Delivery model: aligned with the canonical DeliverySchema by adding the missing fields seq, location_type, location_node_id, expires_at, delivered_at, acked_at, dead_lettered_at to match the TS SDK surface. - channels.set_topic: corrected the route from PATCH /v1/channels/{name} to PATCH /v1/channels/{name}/topic to match the TS setTopic() and the dedicated openapi endpoint (it was colliding with channels.update). - channels.invite: corrected the request body field from {"agent": ...} to {"agent_name": ...} to match InviteRequestSchema / the TS SDK wire shape (Python sends keys verbatim with no camel->snake conversion). - Updated test_channels_set_topic to assert the corrected /topic route. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(sdk-swift): bring Swift SDK to 100% parity with TypeScript SDK Add the relay-level surfaces that were missing from the Swift SDK: - nodes namespace: list (GET /v1/nodes, capability/name filters), get (GET /v1/nodes/{name}) with NodeRosterEntry + NodeCapability models - triggers namespace: create/list/get/update/delete full lifecycle (POST/GET/PATCH/DELETE /v1/triggers[/{id}]) with Trigger, CreateTriggerRequest, UpdateTriggerRequest models - activity feed: activity(limit) -> GET /v1/activity - workspace-level DM queries: allDMConversations (GET /v1/dm/conversations/all) and dmMessages (GET /v1/dm/conversations/{id}/messages) Fix the stale DeliveryStatus enum to the current statuses (queued|delivered|acked|failed|dead_lettered), replacing the old accepted/deferred values. All routes verified present in openapi.yaml. Adds tests for nodes, triggers, workspace DM queries, activity, and the delivery-status enum. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(sdk-rust): full Rust↔TypeScript SDK parity Bring the Rust SDK to 100% feature parity with the TypeScript reference SDK. Every new route is documented in openapi.yaml. New RelayCast surfaces: - Workspace bootstrap: lookup_workspace (GET /v1/workspaces/by-name/{name}) - A2A: register_a2a, list_a2a_agents, remove_a2a_agent, get_a2a_agent_card - Routing: route, route_feedback, get_routing_config, update_routing_config - Directory: search_directory, publish_to_directory, list_directory, get_directory_agent, update_directory_agent, delete_directory_agent, list_directory_ratings, rate_directory_agent - Skills: import_skills, search_skills - Fleet nodes: list_nodes, get_node - Triggers: create_trigger, list_triggers, get_trigger, update_trigger, delete_trigger - Certification: certify, get_certification, certification_badge_url, monitor_certification - Console: console_messages, console_stats (ConsoleOverview), console_agents, console_costs New AgentClient surfaces: - channels mute_channel / unmute_channel - invite_to_channel fixed to send documented `agent_name` body Models: added serde structs for A2A cards/records, directory agents/skills/ ratings, routing config/weights, skill search results, node roster with capability objects, triggers, certification runs, and console stats — all snake_case to match the wire contract. DeliveryStatus enum updated to the canonical lifecycle (queued|delivered|acked|failed|dead_lettered); tests updated to match. Adds parity tests for every new surface. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: apply pr-reviewer fixes for #190 --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: agent-relay-code[bot] <agent-relay-code[bot]@users.noreply.github.com>
Summary
Implements Fleet Delivery Phase 2 for AgentWorkforce/relay#1056 on top of
feat/fleet-nodes-engine/ relaycast#192.queued -> delivered -> acked, per-agent monotonicseq, cumulative nodedelivery.ack, TTL dead-lettering, and depth-cap reject-new sender feedback.via_nodeagents receivedeliver {agent,msg_id,seq,mode,payload}over the node control connection./v1/inboxdoes not resurface acked messages, and preserves unacked deliveries for reconnect/broker-death redelivery.Stack / Merge Order
This PR is stacked on relaycast#192 (
feat/fleet-nodes-engine) and should merge after that PR lands. The base requested here ismain, so the diff includes the stack context until #192 is merged.§11 Audit / §5 Cleanup Gate
Audit commands searched the repo for
relay://,/v1/ws, resource subscriptions, and per-agent stream consumers.Findings:
relay://resource definitions were found to delete in this repo.packages/mcp/src/resources/definitions.ts,packages/mcp/src/resources/ws-bridge.ts,packages/mcp/src/resources/subscriptions.ts, and MCP resource/subscription tests still register/read/subscriberelay://...resources./v1/wsis still consumed by the TypeScript SDK (AgentClient.connect/ subscription helpers), SDK docs/examples, and other SDK surfaces.Result: the PTY-agent resource/subscription surface is kept in this PR. Actual deletes remain gated behind that audit because consumers still exist.
Verification
npm run typecheck --workspace=@relaycast/enginenpx vitest run src/__tests__/conformance/delivery.test.ts src/__tests__/conformance/node.test.ts src/__tests__/atomicity.test.tsfrompackages/enginenpm run test --workspace=@relaycast/typesnpm run test --workspace=@relaycast/enginenpm run test --workspace=@relaycast/sdknpm run buildnpm run testRoot
package.jsondoes not define a repo-leveltypecheckscript; engine workspace typecheck is the explicit typecheck run.