Skip to content

feat(engine): per-workspace fleet rollout flag + migration single-delivery guarantees (Phase 6)#194

Merged
willwashburn merged 9 commits into
mainfrom
feat/fleet-rollout-flag
Jun 15, 2026
Merged

feat(engine): per-workspace fleet rollout flag + migration single-delivery guarantees (Phase 6)#194
willwashburn merged 9 commits into
mainfrom
feat/fleet-rollout-flag

Conversation

@willwashburn

@willwashburn willwashburn commented Jun 14, 2026

Copy link
Copy Markdown
Member

Phase 6 — per-workspace fleet rollout flag + E2E-surfaced compat fixes (RFC AgentWorkforce/relay#1056)

1. Per-workspace fleet rollout flag

Gates the entire fleet node control surface behind a per-workspace
fleet_nodes_enabled flag (default OFF) so fleet can ship dark and roll out
workspace-by-workspace. Legacy per-agent WS delivery is unaffected either way.

The flag is checked once at each genuine boundary (no scattered checks):

  • node control WS (/v1/node/ws) → fleet_nodes_disabled (404) when off
  • node roster routes (/v1/nodes*) → flat 404 via requireFleetNodes
  • declarative trigger evaluation → skipped at the message hook
  • spawn placement + node-handler dispatch → refuse in invokeAction
    (agent-handler actions stay available)

KV override with a short cache, defaulting to EngineConfig.fleetNodesEnabled.
GET/PUT /v1/workspace/fleet-nodes toggles the per-workspace override. The
relaycast-engine serve bin also reads RELAYCAST_FLEET_NODES_ENABLED and
RELAYCAST_MAILBOX_TTL_MS / RELAYCAST_MAILBOX_DEPTH_CAP for self-host tuning.

Migration single-delivery: a test proves a legacy self-connected agent is
never also delivered via a node when the flag flips mid-stream (one active
location per agent; a node's agent.register for an already-connected agent is
rejected agent_location_conflict).

2. Two cross-repo compat fixes surfaced by the Phase 6 E2E

The engine runtime deviated from the frozen wire contract (#191) in two
places despite the golden fixtures, so a real broker could never bring a node
online or complete a spawn. Both are fixed engine-side here (the broker is
contract-correct; no Rust release). Neither reopens frozen #192.

  • Node-token transport (engine read-side feat(engine): node registry, node-native actions, placement, triggers (fleet Phase 1) #192 ↔ broker send-side #1107): the
    Node HTTP-server upgrade handler in entrypoints/node.ts read the token only
    from ?token=, but the broker's node_control client sends
    Authorization: Bearer <nt_live_…>. It also had no fleet-flag gate for
    /v1/node/ws. Fixed: read the token from query or Bearer header, and gate
    the upgrade behind isFleetNodesEnabled.
  • agent.register reply frame (token authority): the broker awaits a canonical
    reply frame {v, id, ok:true, data:{agent_id, token, name?}} keyed by request
    id (deny_unknown_fields); the engine emitted a bare {type:'agent.registered', …},
    so register_fleet_agent_token hung to its 30s timeout and spawn never
    completed
    . Fixed: emit the canonical reply frame.

Regression guards

  • nodeUpgradeAuth.test.ts: a real WS client authenticates via both the Bearer
    header and the query param; flag-off → 404; missing/malformed token → 401.
  • The relay two-node E2E (relay#1131) asserts a real broker boots online (Bearer)
    and a spawn completes end-to-end (token mint+inject) — these regressions
    cannot reship.

Version skew

Old v8.x brokers keep working (fleet flag off). Fleet nodes require the new
broker + engine + the per-workspace flag on. The hosted (Cloudflare DO)
equivalent of these fixes is handled in PR 5.

Tests

Full engine suite green (111). typecheck + targeted eslint clean.

🤖 Generated with Claude Code

@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@willwashburn, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 32 minutes and 13 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e4787f13-a22d-427c-856a-d2eff87a7e6f

📥 Commits

Reviewing files that changed from the base of the PR and between c7a32b4 and fe11a4e.

📒 Files selected for processing (7)
  • memory/workspace/.relay/state.json
  • openapi.yaml
  • packages/engine/src/__tests__/conformance/delivery.test.ts
  • packages/engine/src/db/migrations/0019_fleet_mailbox.sql
  • packages/engine/src/engine/delivery.ts
  • packages/engine/src/engine/deliveryWrites.ts
  • packages/types/src/__tests__/sdk-openapi-sync.test.ts
📝 Walkthrough

Walkthrough

Introduces a per-workspace fleetNodesEnabled feature flag backed by a KV store with TTL caching. All node HTTP endpoints, action node-handler dispatch, message trigger firing, and WebSocket upgrades are gated behind this flag. New GET/PUT /workspace/fleet-nodes endpoints manage per-workspace overrides. The self-host binary reads the flag from an environment variable. Three new test suites cover the rollout behavior.

Changes

Fleet Nodes Feature Flag Rollout

Layer / File(s) Summary
Feature flag library and EngineConfig contract
packages/engine/src/ports/index.ts, packages/engine/src/lib/fleetNodes.ts
Adds fleetNodesEnabled?: boolean to EngineConfig and introduces getFleetNodesConfig, isFleetNodesEnabled, and setFleetNodesOverride backed by a KV store with in-memory TTL cache.
requireFleetNodes middleware and HTTP route gating
packages/engine/src/middleware/fleetNodes.ts, packages/engine/src/routes/node.ts, packages/engine/src/routes/action.ts, packages/engine/src/routes/message.ts
Adds the requireFleetNodes Hono middleware returning 404 fleet_nodes_disabled when the flag is off; inserts it into node enrollment/roster route chains; gates invokeAction's node-handler dispatch and spawn; wraps fireMessageTriggers in a fleet-flag check.
WS upgrade: Bearer auth + fleet flag gate + node_control reply fix
packages/engine/src/engine.ts, packages/engine/src/entrypoints/node.ts, packages/engine/src/engine/node.ts
Extracts node tokens from Authorization: Bearer or ?token=; adds per-workspace fleet-nodes gate on /v1/node/ws upgrades (404 when disabled); trims agent.register control reply to { agent_id, token, name }.
Workspace fleet-nodes API endpoints and OpenAPI spec
packages/engine/src/routes/workspace.ts, openapi.yaml
Adds GET /workspace/fleet-nodes and PUT /workspace/fleet-nodes handlers with Zod validation, setFleetNodesOverride persistence, telemetry, and error handling; documents both in the OpenAPI spec.
Self-host binary env-var wiring
packages/engine/src/bin/serve.ts
Parses RELAYCAST_FLEET_NODES_ENABLED, RELAYCAST_MAILBOX_TTL_MS, and RELAYCAST_MAILBOX_DEPTH_CAP from environment variables and passes them into startServer's config object.
Conformance tests, WS upgrade regression tests, and harness update
packages/engine/src/__tests__/conformance/harness.ts, packages/engine/src/__tests__/conformance/delivery.test.ts, packages/engine/src/__tests__/conformance/fleetRolloutFlag.test.ts, packages/engine/src/__tests__/nodeUpgradeAuth.test.ts
Extends makeNodeStack with fleetNodesEnabled option; adds full flag-off/on/migration conformance suite and WS upgrade auth regression suite; fixes registerViaNode to assert the trimmed reply shape.
Package changelogs
packages/types/CHANGELOG.md, packages/sdk-typescript/CHANGELOG.md
Documents breaking DeliveryStatus rework, new fleet-wire types, 3.1.1 baseline, fleet node roster additions, and durable delivery status semantic changes.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  rect rgba(70, 130, 180, 0.5)
    note over Client,workspaceRoutes: Workspace fleet-nodes management
  end
  Client->>workspaceRoutes: PUT /workspace/fleet-nodes { enabled: true }
  workspaceRoutes->>setFleetNodesOverride: (kv, workspaceId, true)
  setFleetNodesOverride->>KVStore: write 'true'
  setFleetNodesOverride->>TTLCache: invalidate workspaceId entry
  workspaceRoutes->>getFleetNodesConfig: read updated state
  workspaceRoutes-->>Client: { ok: true, enabled, default_enabled, override }

  rect rgba(60, 179, 113, 0.5)
    note over NodeClient,isFleetNodesEnabled: Node WS upgrade with flag check
  end
  NodeClient->>UpgradeHandler: HTTP Upgrade /v1/node/ws (Bearer token)
  UpgradeHandler->>UpgradeHandler: resolve token, validate nt_live_, hash, load node
  UpgradeHandler->>isFleetNodesEnabled: (kv, node.workspaceId, config.fleetNodesEnabled)
  isFleetNodesEnabled->>KVStore: read override
  KVStore-->>isFleetNodesEnabled: 'true'
  isFleetNodesEnabled-->>UpgradeHandler: true
  UpgradeHandler-->>NodeClient: 101 Switching Protocols

  rect rgba(220, 80, 80, 0.5)
    note over NodeClient,isFleetNodesEnabled: Flag off — upgrade rejected
  end
  NodeClient->>UpgradeHandler: HTTP Upgrade /v1/node/ws
  UpgradeHandler->>isFleetNodesEnabled: (kv, node.workspaceId, false)
  isFleetNodesEnabled-->>UpgradeHandler: false
  UpgradeHandler-->>NodeClient: 404 fleet_nodes_disabled
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • AgentWorkforce/relaycast#191: Defines the fleet-wire agent.register reply-data Zod schema and fixtures that match the trimmed { agent_id, token, name } payload shape introduced in this PR's handleNodeControlMessage fix.

Poem

🐇 A flag in the KV store, so small yet so grand,
Each workspace may opt in with a wave of its hand.
The Bearer token hops through auth gates with care,
Fleet nodes wake or sleep — per workspace, it's fair.
The relay broker gets just the shape it expects,
No stale full objects, no protocol defects!
✨ Hop hop, the rollout proceeds with delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary changes: per-workspace fleet rollout flag (Phase 6) and migration single-delivery guarantees, matching the main objectives.
Description check ✅ Passed The description provides comprehensive details about the fleet rollout flag implementation, compatibility fixes, and regression guards, directly addressing the changeset objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/fleet-rollout-flag

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

agent-relay-code Bot added a commit that referenced this pull request Jun 14, 2026
agent-relay-code Bot added a commit that referenced this pull request Jun 14, 2026

@willwashburn willwashburn left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RevA1 — adversarial review, round 1. Built the head branch in a clean worktree: typecheck ✅, targeted eslint ✅, full engine suite 111/111 ✅ (the 2 new files, 9 tests, execute and pass). Verified the four claimed boundaries are genuinely gated; the migration guarantee and the Bearer/query compat fix both hold under test. No blockers or majors. Findings below are all minor/nit — none block merge, but a couple touch the "every node surface inert when OFF" claim.


1. [minor] Token precedence is query-first, but the design calls for header precedence — and the both-present case is untested.
engine.ts:501 and entrypoints/node.ts:113 both do url.searchParams.get('token') ?? bearer, i.e. query wins when both are present. The Phase-6 brief specifies header precedence. In practice the broker sends only the header and SDK/Pear only the query, so no shipping transport sends both — impact is low — but the precedence is a silent, untested contract. nodeUpgradeAuth.test.ts only covers header-only and query-only in isolation. Pick a precedence deliberately, document it in the comment, and add a both-present assertion so it can't drift.

2. [minor] The invokeAction fleet gate is fail-OPEN while every other gate is fail-CLOSED.
engine/action.ts:357 defaults const fleetNodesEnabled = options.fleetNodesEnabled ?? true. The WS gate, requireFleetNodes, the trigger hook, and the workspace routes all default config.fleetNodesEnabled ?? false (fail-closed). Today there are only two callers — routes/action.ts threads the resolved flag, and engine/trigger.ts:180 relies on the upstream message.ts gate (so it passes nothing → defaults true). That works only because the message hook gates trigger evaluation. Any future caller that forgets the option silently re-opens spawn + node-handler dispatch with the flag off. Make the option required, or default it false, so the placement gate fails closed like the others.

3. [minor] The declarative-trigger management surface is not gated — only evaluation is.
routes/trigger.ts:22-76 (POST/GET/PATCH/DELETE /v1/triggers) carry only requireAuth, no requireFleetNodes. The PR body says "flag OFF → every node surface inert," and declarative triggers are explicitly part of the fleet control surface; fleetRolloutFlag.test.ts:188 even asserts POST /v1/triggers → 201 while OFF. So a flag-off workspace can fully create/edit/list triggers (they just never fire). That's a deliberate choice but it contradicts the "every surface" wording and is the second ungated node boundary. Either add requireFleetNodes to the trigger CRUD routes for consistency, or document the intentional exception and soften the claim.

4. [minor] Disabling the flag does not tear down live node connections or re-home via_node agents.
The WS gate guards the upgrade (new connections) and agent.register only. A node already connected when a workspace flips OFF keeps receiving deliver frames until it reconnects — the ON→OFF kill-switch direction is unguarded and untested (the migration test only exercises OFF→ON, which is the right primary direction). Fine for a forward dark-launch flag, but please either document the flag as forward-rollout-only or close live node sockets on disable so "inert when OFF" is true for in-flight nodes too.

5. [nit] OpenAPI under-specifies /workspace/fleet-nodes. GET/PUT advertise only a generic SuccessResponse; the real payload is { enabled, default_enabled, override }, and the PUT body's valid shapes are { enabled: boolean } XOR { mode: "inherit" }. Tighten the schema so generated clients see the real contract. (openapi.yaml:18-61)

6. [nit] Ambiguous PUT body precedence. routes/workspace.ts lets { enabled: true, mode: "inherit" } silently clear the override (mode wins) rather than 400-ing. Low risk; consider rejecting bodies that set both.


Verified correct (not rubber-stamping):

  • agent_location_conflict (engine/node.ts:349) genuinely fires for an active self_connected agent — the onConflictDoUpdate.setWhere predicate (node.ts:341-344) only permits the update when the row is inactive or already via-node on the same node. The migration test is non-vacuous: a broken guard would relocate legacy to via_node, flipping both legacySock == 2 and nodeDeliveries == 0.
  • Bearer header AND ?token= query each upgrade to 101 when the workspace flag is ON; 404 when OFF; 401 on missing/malformed — exercised against a real booted Node server.
  • Flag default OFF confirmed: DEFAULT_ENABLED = false, every call site ?? false.

Round-1 verdict: GO on substance — no blockers/majors. Recommend addressing #1#4 before flag flip.

@willwashburn willwashburn left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial review — round 3 (head b673dfb) — GO

Built head + ran the engine suite myself: build green, tsc --noEmit clean, eslint clean on all changed files, 111/111 tests pass (the two new files — nodeUpgradeAuth.test.ts, fleetRolloutFlag.test.ts — confirmed to actually execute via vitest list). All three round-3 items verify:

(1) Rollout flag — all 4 boundaries gated, default OFF, flip sane ✅

  • WS upgrade: gated in both engine.ts:204 (hosted/Hono) and entrypoints/node.ts:133 (self-host). Node roster: requireFleetNodes on POST/GET/GET:name (routes/node.ts:22,36,49). Triggers: routes/message.ts:158. Spawn placement + node-handler dispatch: engine/action.ts:373,397.
  • Default OFF: DEFAULT_ENABLED=false; every call site passes config.fleetNodesEnabled ?? false. KV override + 10s cache invalidated on setFleetNodesOverride (lib/fleetNodes.ts:794). Flip-on/flip-off behavior exercised by both new test files. Survived the rebase intact.

(2) Bearer-OR-query node-WS auth ✅

  • Both upgrade paths read query('token') ?? bearer, case-insensitive ^bearer\s+, trimmed; missing→401, non-nt_live_→401, unknown hash→401, flag-off→404. Token is only hashed for lookup, never forwarded. nodeUpgradeAuth.test.ts exercises header-only and query-only against a real ws client.

(3) Canonical agent.register reply frame ✅ (code correct)

  • engine/node.ts:519 emits {v:1, id:requestId(message), type:'reply', ok:true, data:{agent_id, token, name}}. Matches FleetReplyMessageSchema (.strict()) exactly, and data carries only {agent_id, token, name}invocation_id/session_ref correctly excluded, so the broker's deny_unknown_fields parse of AgentRegisterReplyData {agent_id, token, name?} is satisfied. id echoes message.id when present → id-correlated.

Non-blocking findings (minor / nit)

  1. [minor] Success-path id-correlation is untested. engine/node.ts:520 is the entire point of the fix (broker matches pending_agent_registrations by reply.id), yet no test asserts a successful reply's id equals the request id. delivery.test.ts:110 sends agent.register with no id and never reads reply.id; only the conflict/error path (fleetRolloutFlag.test.ts:236, id claim-legacy) checks correlation. A regression to id: generateId() on the success branch would pass the whole suite — so the PR's "these regressions cannot reship" claim doesn't actually hold for this property. Fix: in delivery.test.ts registerViaNode, send id: 'reg-<name>' and expect(reply.id).toBe('reg-<name>').

  2. [minor/informational] Frozen contract guard eroded (originated in base, not this PR). git diff ee2c001 -- packages/types is not empty: AgentRegisterReplyDataSchema and the golden fixtures/fleet-wire/reply.agent_register.json were deleted, and reply.json's data was reshaped to {agent_id,name,token,invocation_id,session_ref}. These come from base-branch commits (8518e1a, 33ede3c, ddffa76), not #194#194 touches zero files under packages/types, so the "engine-only, no Rust release" claim holds. But the cross-repo parity for the register-reply data shape now rests solely on the engine's hand-built object + the broker's Rust struct, with no types-level schema or golden fixture pinning it. Worth restoring a fixture/schema guard in a follow-up so finding #1's correlation + shape can't silently drift again.

  3. [nit] requireFleetNodes runs before rateLimit on all three node routes (routes/node.ts:22,36,49), so flag-off requests return 404 without passing the rate limiter. Each call is a cached KV read, so impact is negligible, but ordering is inverted vs. the usual …, rateLimit, handler convention.

  4. [nit] Precedence is query-over-header, not header-over-query. query('token') ?? bearer means a ?token= wins over the Authorization: Bearer header if both are sent. No practical impact (broker sends header only, SDK query only), but it's the opposite of "header precedence" — pick one and note it.

Verdict: all three round-3 items verified, nothing new at blocker/major. GO.

@willwashburn willwashburn left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 4 (adversarial) — GO ✅ clean

Independent third completed pass on head b673dfb. Built and tested from a fresh worktree of origin/feat/fleet-rollout-flag. No blockers or majors found; confirming rounds 1 & 3.

Independent verification

  • Typecheck (tsc --noEmit, after building @relaycast/types+@relaycast/a2a): clean
  • Engine suite (vitest run): 111/111 passing, 13 files
  • ESLint on all 11 changed engine files: clean

Round-4 confirmation criteria

  1. Flag gates all 4 node boundaries, default OFF
    • WS upgrade: engine.ts:202 (Hono/in-process) + entrypoints/node.ts:131 (self-host) → 404 fleet_nodes_disabled
    • Roster routes /v1/nodes*: requireFleetNodes on POST/GET/GET:name (routes/node.ts:22,36,49) → flat 404
    • Trigger eval: skipped at the message hook (routes/message.ts:153-163)
    • Spawn placement + node-handler dispatch: refuse in invokeAction (engine/action.ts:362,397); agent-handler actions unaffected (asserted in test)
    • Default OFF: DEFAULT_ENABLED=false (lib/fleetNodes.ts:22) + config.fleetNodesEnabled ?? false at every call site.
  2. /v1/node/ws Bearer-OR-query auth ✅ — Authorization: Bearer OR ?token= (engine.ts:185-189, entrypoints/node.ts:109-113); malformed/missing → 401 (nodeUpgradeAuth.test.ts); flag-off → 404; nt_live_ prefix enforced before hash lookup.
  3. Canonical reply frame ✅ — agent.register emits {v:1, id:requestId(message), type:'reply', ok:true, data:{agent_id, token, name}} (engine/node.ts:522), id-correlated, not agent.registered. Minimal 3-field data matches the broker's deny_unknown_fields + #191 AgentRegisterReplyData. Conformance test updated to assert the reply frame (delivery.test.ts). Error path is symmetric: {v:1, id, type:'error', ok:false, code, message} matches the strict FleetErrorMessageSchema.
  4. Frozen contract ✅ (with one clarification, below) — PR #194's diff vs its base (feat/fleet-mailbox) touches zero packages/types files; the wire contract is untouched by this PR.

Minor / informational (non-blocking)

  1. (nit, pre-existing, not #194) git diff ee2c001 -- packages/types is not empty — but those changes (reply.json gaining invocation_id/session_ref, deletion of reply.agent_register.json) belong to earlier PRs in the stack, not #194. The runtime engine emits only the 3-field data, so the broker deny_unknown_fields contract is satisfied. Note for the stack owner: the reply.json fixture's data now carries 5 fields while the engine emits 3; harmless because the engine's reply data is freeform JSON, but a broker decoding that fixture verbatim with deny_unknown_fields would reject the extra fields. Worth a glance in the broker repo's fixture-parity check.
  2. (nit) nodeUpgradeAuth.test.ts tryUpgrade resolves only on open/unexpected-response; a bare socket error (no HTTP response) would hang the promise. Fine against a live localhost server, but slightly fragile.
  3. (nit) openapi.yaml documents the GET/PUT /workspace/fleet-nodes responses as generic SuccessResponse rather than the actual {enabled, default_enabled, override} shape — consistent with the existing /workspace/stream docs, so cosmetic only.

Verdict: GO. Phase 6 rollout flag + the two cross-repo compat fixes are correct and well-guarded.

@willwashburn willwashburn force-pushed the feat/fleet-rollout-flag branch from b673dfb to 87f898b Compare June 15, 2026 16:13
Base automatically changed from feat/fleet-mailbox to main June 15, 2026 16:21
@willwashburn willwashburn marked this pull request as ready for review June 15, 2026 16:21
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

willwashburn and others added 4 commits June 15, 2026 12:22
…ivery guarantees (Phase 6)

Gate the entire fleet node control surface behind a per-workspace
`fleet_nodes_enabled` flag (default OFF), so fleet can ship dark and roll
out workspace-by-workspace. Legacy per-agent WS delivery is unaffected
either way.

The flag is checked once at each genuine boundary (no scattered checks):
- node control WS (`/v1/node/ws`) rejects with `fleet_nodes_disabled` (404)
- node roster routes (`/v1/nodes*`) return a flat 404 via `requireFleetNodes`
- declarative trigger evaluation is skipped at the message hook
- spawn placement + node-handler dispatch refuse in `invokeAction`
  (agent-handler actions stay available)

Flag source mirrors the workspace-stream pattern: a KV override with a
short in-memory cache, defaulting to `EngineConfig.fleetNodesEnabled`.
GET/PUT `/v1/workspace/fleet-nodes` toggles the per-workspace override.

Tests:
- flag OFF -> every node surface inert (roster, spawn, WS gate, triggers)
- per-workspace override flips the surface on/off; WS gate follows the flag
- migration single-delivery: a legacy self-connected agent is never also
  delivered via a node when the flag flips mid-stream (exclusive location;
  a node's `agent.register` for it is rejected `agent_location_conflict`)

The conformance harness defaults the flag ON, so existing node integration
tests pass unchanged. Full engine suite green (108 tests).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…WS upgrade behind fleet flag (Phase 6)

Cross-repo compat fix surfaced by the Phase 6 two-node E2E: a real relay
broker could never bring a node online against a self-hosted engine.

Root cause: the node-control read-side is the Node HTTP-server `upgrade`
handler in `entrypoints/node.ts` (the Hono `/v1/node/ws` route only answers
the 426 — Node owns the 101). That handler read the token ONLY from the
`?token=` query param, but the relay Rust broker's node_control client sends
it as `Authorization: Bearer <nt_live_…>`. It also had NO fleet-flag gate for
`/v1/node/ws` (only the rk_live workspace-stream path was gated), so the
Phase 6 rollout flag did not actually cover the node control surface on the
self-host adapter.

Fix, both in the upgrade handler:
- read the node token from `?token=` query OR `Authorization: Bearer` header
  (query stays for SDK/Pear; header unblocks the shipped broker — no Rust
  release needed)
- gate the `/v1/node/ws` upgrade behind `isFleetNodesEnabled` (404 when off),
  mirroring the existing stream gate

Also mirrored the dual-transport read in the Hono `/v1/node/ws` route for any
adapter that routes upgrades through it.

Accepted-stack PRs involved: engine read-side #192, broker send-side #1107.
The hosted (Cloudflare DO) equivalent is handled in PR 5.

Test: `nodeUpgradeAuth.test.ts` boots the real Node server and asserts a WS
client authenticates via BOTH the Bearer header and the query param, that the
upgrade is rejected while the workspace flag is off (404), and that a
missing/malformed token is rejected (401). Full engine suite green (111).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…me (Phase 6 token authority)

Third cross-repo compat fix surfaced by the Phase 6 E2E (spawn scenarios):
spawn never completes end-to-end against a real broker. The relay broker's
node_control client awaits a `reply` frame keyed by the request id — it matches
`pending_agent_registrations` by `reply.id` and parses `data` as
`{agent_id, token, name}` with `deny_unknown_fields`. The engine instead
answered `agent.register` with a bare `{type:'agent.registered', ...}` carrying
the full object (incl. invocation_id/session_ref), which the broker never
matches → `register_fleet_agent_token` hangs to its 30s timeout → the spawn
action fails. This blocked every spawn-dependent path (placement completion,
mailbox delivery to via-node agents, resume).

Reply in the shape the shipped broker consumes; the broker already holds the
invocation_id/session_ref it sent, so only the minted identity is echoed. Same
root pattern as the node-token transport mismatch (#192 read-side ↔ #1107
broker send-side); no Rust release needed.

Updated the one conformance helper that asserted the old frame. Engine suite
green (111).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…L/depth-cap

The `relaycast-engine` serve bin gains optional env tuning so operators (and the
Phase 6 fleet E2E) can configure the bounded mailbox and the fleet rollout
default without code changes:
- RELAYCAST_FLEET_NODES_ENABLED=1 → EngineConfig.fleetNodesEnabled
- RELAYCAST_MAILBOX_TTL_MS / RELAYCAST_MAILBOX_DEPTH_CAP → mailbox tuning

Unset env leaves the existing defaults untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@willwashburn willwashburn force-pushed the feat/fleet-rollout-flag branch from 87f898b to a15580e Compare June 15, 2026 16:23
#192 (merged) owns 0017_spawn_reservation_and_retry_state and 0018; the Phase 2
mailbox migration was authored as 0017 on an older base, colliding on the 0017
prefix once #192/#193 landed in main. Renumber to 0019 (after 0018) so the D1
migration sequence is unique and ordered. Pure file rename — no code references
the filename, and the migration has not been applied to any environment yet
(engine unpublished), so there is no D1 re-apply risk.
@chatgpt-codex-connector

Copy link
Copy Markdown

💡 Codex Review

const [updated] = await ackRows(db, workspaceId, agentId, [existing], existing.seq);

P2 Badge Do not advance the cumulative cursor for a single ack

When a client uses the per-delivery REST ack out of order, for example acking seq 2 while seq 1 is still queued/deferred, this call passes only that row to ackRows but advances agents.delivery_ack_seq to existing.seq. deliverPendingToNode later filters on seq > delivery_ack_seq, so if the agent is moved to or reconnects through a node, the lower unacked delivery is skipped forever instead of replayed.


SELECT MAX(seq)
FROM deliveries
WHERE deliveries.agent_id = agents.id
AND deliveries.status = 'acked'

P2 Badge Compute the migrated ack cursor from a contiguous prefix

For migrated data where an older accepted/deferred delivery remains active and a newer delivered row exists, this initializes delivery_ack_seq to the maximum acked seq. Since node replay only sends rows with seq > delivery_ack_seq, that older queued row with a lower seq will never be delivered after the agent migrates to fleet nodes.


const eventType = row.dmType
? (row.dmType === 'group' ? 'group_dm.received' : 'dm.received')
: (row.threadId ? 'thread.reply' : 'message.created');

P2 Badge Preserve thread replies when replaying DM backlog

On broker reconnect, any queued message whose channel belongs to a DM conversation is classified as dm.received/group_dm.received before checking threadId. Live thread replies are routed as thread.reply, so a missed thread reply inside a DM or group DM is redelivered with the wrong event type and payload shape.


AND d.status IN ('queued', 'delivered')

P2 Badge Exclude expired rows from the mailbox depth cap

When an agent is at the depth cap and its queued/delivered rows pass expires_at, this count still treats them as active; expiry is only swept from GET /deliveries or /inbox, not on send or node replay. An idle/offline recipient can therefore keep rejecting new messages as depth_cap after TTL instead of dead-lettering stale rows.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…yStatus remap

Changelogs here are hand-curated (no CI generation), and the fleet stack
(#191-#194) was missing from them. Add the user-facing entries:

- @relaycast/types: new CHANGELOG; document the breaking DeliveryStatus enum
  remap (accepted/deferred removed, acked/dead_lettered added, delivered
  re-meaning) with old->new mapping + flag-independent migration note, the new
  Delivery location/lifecycle fields, and the fleet-wire protocol module.
- @relaycast/sdk-typescript: node roster API (nodes.list/get, triggers.list),
  capability objects, handler/dispatch node fields, JsonValue export, and the
  breaking action-output widening + delivery status value change.

These confirm the next @relaycast/types + sdk-typescript publish is a MAJOR.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
packages/engine/src/bin/serve.ts (1)

63-80: Use a Zod schema for environment variable parsing.

The inline ad-hoc numeric and boolean parsing in serve.ts (lines 63–80) introduces manual validation logic that diverges from the codebase's established pattern. Throughout the engine, Zod schemas are used consistently to validate request bodies and other inputs (e.g., in routes and engine logic). Environment variables should follow the same pattern with a schema to standardize parsing, error handling, and type safety.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/bin/serve.ts` around lines 63 - 80, The inline ad-hoc
parsing in the `num` helper function and conditional spreading for environment
variables diverges from the codebase's established Zod schema pattern. Create a
Zod schema to define and validate environment variables including
RELAYCAST_MAILBOX_TTL_MS, RELAYCAST_MAILBOX_DEPTH_CAP, and
RELAYCAST_FLEET_NODES_ENABLED, then replace the `num` function and the
conditional spreading logic with calls to this schema's parse or parseAsync
method to handle environment variable parsing in a consistent, standardized way
that aligns with the rest of the engine codebase.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openapi.yaml`:
- Around line 2149-2163: The PUT `/workspace/fleet-nodes` endpoint schema
currently allows both `enabled` and `mode` properties to be optional, but the
runtime rejects empty payloads with a 400 error. Add a `required` array to the
object schema containing at least one of the property names (either "enabled" or
"mode") to enforce that the payload cannot be empty. Additionally, add a missing
`400` response definition after the `200` response to document the
invalid_request error that is returned when validation fails at runtime.

In `@packages/engine/src/engine.ts`:
- Around line 182-186: The authHeader variable assignment on line 182 violates
repo guidelines by using a mixed-case fallback pattern with both 'Authorization'
and 'authorization' headers. Replace this mixed-case fallback with a single
canonical header lookup that normalizes the header name to lowercase when
retrieving it, so that c.req.header() is called only once with a consistent
case-normalized approach rather than trying multiple case variations as a
fallback.

In `@packages/engine/src/middleware/fleetNodes.ts`:
- Line 15: The condition at line 15 in fleetNodes.ts accesses
config.fleetNodesEnabled without guarding against config being undefined, since
config is optional in EngineDeps. This will throw a runtime TypeError if config
is not provided. Fix this by using optional chaining (config?.fleetNodesEnabled)
when accessing the fleetNodesEnabled property in the isFleetNodesEnabled call,
ensuring the property access is safe when config is undefined or null.

In `@packages/engine/src/routes/workspace.ts`:
- Around line 41-44: Refactor the updateFleetNodesSchema to use a discriminated
union with Zod instead of relying on manual validation in the handler. Replace
the current optional fields schema (with enabled and mode as optional
strings/booleans and passthrough) with a discriminated union pattern that
defines two distinct payloads: one with mode set to 'inherit' and another with
enabled as a boolean. Apply .strict() to the schema to enforce strict validation
and eliminate the need for manual runtime checks. Then update the handler logic
to remove the ad-hoc branching validation that currently checks which field is
provided and validates the types, since the Zod schema will now enforce this
contract at parse time.

---

Nitpick comments:
In `@packages/engine/src/bin/serve.ts`:
- Around line 63-80: The inline ad-hoc parsing in the `num` helper function and
conditional spreading for environment variables diverges from the codebase's
established Zod schema pattern. Create a Zod schema to define and validate
environment variables including RELAYCAST_MAILBOX_TTL_MS,
RELAYCAST_MAILBOX_DEPTH_CAP, and RELAYCAST_FLEET_NODES_ENABLED, then replace the
`num` function and the conditional spreading logic with calls to this schema's
parse or parseAsync method to handle environment variable parsing in a
consistent, standardized way that aligns with the rest of the engine codebase.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8a03e497-b67a-46c2-8d31-9e5b677a0181

📥 Commits

Reviewing files that changed from the base of the PR and between 31caef8 and c7a32b4.

📒 Files selected for processing (20)
  • openapi.yaml
  • packages/engine/src/__tests__/conformance/delivery.test.ts
  • packages/engine/src/__tests__/conformance/fleetRolloutFlag.test.ts
  • packages/engine/src/__tests__/conformance/harness.ts
  • packages/engine/src/__tests__/nodeUpgradeAuth.test.ts
  • packages/engine/src/bin/serve.ts
  • packages/engine/src/db/migrations/0019_fleet_mailbox.sql
  • packages/engine/src/engine.ts
  • packages/engine/src/engine/action.ts
  • packages/engine/src/engine/node.ts
  • packages/engine/src/entrypoints/node.ts
  • packages/engine/src/lib/fleetNodes.ts
  • packages/engine/src/middleware/fleetNodes.ts
  • packages/engine/src/ports/index.ts
  • packages/engine/src/routes/action.ts
  • packages/engine/src/routes/message.ts
  • packages/engine/src/routes/node.ts
  • packages/engine/src/routes/workspace.ts
  • packages/sdk-typescript/CHANGELOG.md
  • packages/types/CHANGELOG.md

Comment thread openapi.yaml
Comment on lines +182 to +186
const authHeader = c.req.header('Authorization') ?? c.req.header('authorization');
const bearer = authHeader && /^bearer\s+/i.test(authHeader)
? authHeader.replace(/^bearer\s+/i, '').trim()
: undefined;
const token = c.req.query('token') ?? bearer;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the mixed-case header fallback in node token extraction.

Line 182 introduces a mixed-case fallback pattern that the repo guidelines forbid. Normalize through a single canonical header lookup.

Suggested fix
-    const authHeader = c.req.header('Authorization') ?? c.req.header('authorization');
+    const authHeader = c.req.raw.headers.get('authorization') ?? undefined;

As per coding guidelines, **/*.{js,ts,tsx,jsx}: Do not introduce mixed-case field fallbacks.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const authHeader = c.req.header('Authorization') ?? c.req.header('authorization');
const bearer = authHeader && /^bearer\s+/i.test(authHeader)
? authHeader.replace(/^bearer\s+/i, '').trim()
: undefined;
const token = c.req.query('token') ?? bearer;
const authHeader = c.req.header('authorization');
const bearer = authHeader && /^bearer\s+/i.test(authHeader)
? authHeader.replace(/^bearer\s+/i, '').trim()
: undefined;
const token = c.req.query('token') ?? bearer;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine.ts` around lines 182 - 186, The authHeader
variable assignment on line 182 violates repo guidelines by using a mixed-case
fallback pattern with both 'Authorization' and 'authorization' headers. Replace
this mixed-case fallback with a single canonical header lookup that normalizes
the header name to lowercase when retrieving it, so that c.req.header() is
called only once with a consistent case-normalized approach rather than trying
multiple case variations as a fallback.

Source: Coding guidelines

export const requireFleetNodes = createMiddleware<AppEnv>(async (c, next) => {
const workspace = c.get('workspace');
const { kv, config } = c.get('engine');
if (!workspace || !(await isFleetNodesEnabled(kv, workspace.id, config.fleetNodesEnabled ?? false))) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard optional engine config before dereferencing.

At Line 15, config.fleetNodesEnabled is accessed even though config is optional in EngineDeps, which can throw a runtime TypeError and convert this gate into a 500 path.

Proposed fix
 export const requireFleetNodes = createMiddleware<AppEnv>(async (c, next) => {
   const workspace = c.get('workspace');
   const { kv, config } = c.get('engine');
-  if (!workspace || !(await isFleetNodesEnabled(kv, workspace.id, config.fleetNodesEnabled ?? false))) {
+  const defaultEnabled = config?.fleetNodesEnabled ?? false;
+  if (!workspace || !(await isFleetNodesEnabled(kv, workspace.id, defaultEnabled))) {
     return c.json(
       { ok: false, error: { code: 'fleet_nodes_disabled', message: 'Fleet nodes are disabled for this workspace' } },
       404,
     );
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!workspace || !(await isFleetNodesEnabled(kv, workspace.id, config.fleetNodesEnabled ?? false))) {
export const requireFleetNodes = createMiddleware<AppEnv>(async (c, next) => {
const workspace = c.get('workspace');
const { kv, config } = c.get('engine');
const defaultEnabled = config?.fleetNodesEnabled ?? false;
if (!workspace || !(await isFleetNodesEnabled(kv, workspace.id, defaultEnabled))) {
return c.json(
{ ok: false, error: { code: 'fleet_nodes_disabled', message: 'Fleet nodes are disabled for this workspace' } },
404,
);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/middleware/fleetNodes.ts` at line 15, The condition at
line 15 in fleetNodes.ts accesses config.fleetNodesEnabled without guarding
against config being undefined, since config is optional in EngineDeps. This
will throw a runtime TypeError if config is not provided. Fix this by using
optional chaining (config?.fleetNodesEnabled) when accessing the
fleetNodesEnabled property in the isFleetNodesEnabled call, ensuring the
property access is safe when config is undefined or null.

Comment on lines +41 to +44
const updateFleetNodesSchema = z.object({
enabled: z.boolean().optional(),
mode: z.string().optional(),
}).passthrough();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd packages/engine/src/routes && wc -l workspace.ts

Repository: AgentWorkforce/relaycast

Length of output: 85


🏁 Script executed:

cd packages/engine/src/routes && sed -n '30,60p' workspace.ts

Repository: AgentWorkforce/relaycast

Length of output: 990


🏁 Script executed:

cd packages/engine/src/routes && sed -n '410,450p' workspace.ts

Repository: AgentWorkforce/relaycast

Length of output: 1565


Encode the PUT payload contract in Zod instead of post-parse branching.

The schema accepts both enabled and mode as optional fields with .passthrough(), but the handler manually validates that exactly one is provided and has the correct type. This violates the guideline to prefer Zod schemas over ad-hoc manual checks. Define a discriminated union schema ({ mode: 'inherit' } | { enabled: boolean }) with .strict() to enforce the contract at parse time and eliminate the runtime branching logic.

Suggested refactor
-const updateFleetNodesSchema = z.object({
-  enabled: z.boolean().optional(),
-  mode: z.string().optional(),
-}).passthrough();
+const updateFleetNodesSchema = z.union([
+  z.object({ mode: z.literal('inherit') }).strict(),
+  z.object({ enabled: z.boolean() }).strict(),
+]);
...
-    const body = parsed.data;
-
-    let override: boolean | null;
-    if (body?.mode === 'inherit') {
-      override = null;
-    } else if (typeof body?.enabled === 'boolean') {
-      override = body.enabled;
-    } else {
-      return c.json({
-        ok: false,
-        error: { code: 'invalid_request', message: 'Provide { enabled: boolean } or { mode: "inherit" }' },
-      }, 400);
-    }
+    const body = parsed.data;
+    const override: boolean | null = 'mode' in body ? null : body.enabled;

Applies to lines 41–44 and the handler at 423–433.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/routes/workspace.ts` around lines 41 - 44, Refactor the
updateFleetNodesSchema to use a discriminated union with Zod instead of relying
on manual validation in the handler. Replace the current optional fields schema
(with enabled and mode as optional strings/booleans and passthrough) with a
discriminated union pattern that defines two distinct payloads: one with mode
set to 'inherit' and another with enabled as a boolean. Apply .strict() to the
schema to enforce strict validation and eliminate the need for manual runtime
checks. Then update the handler logic to remove the ad-hoc branching validation
that currently checks which field is provided and validates the types, since the
Zod schema will now enforce this contract at parse time.

Source: Coding guidelines

agent-relay-code Bot and others added 3 commits June 15, 2026 16:45
…view)

Address P2 findings from Codex review of the fleet mailbox delivery path:

1. ackDelivery (single per-delivery REST ack) advanced the cumulative cursor to
   the row's own seq, so acking seq 2 while seq 1 is queued moved the cursor past
   seq 1; deliverPendingToNode (seq > delivery_ack_seq) then skipped it forever on
   node replay. Make the cursor advance opt-in (ackRows advanceCursorTo?) — single
   acks no longer advance it; the row's acked status already excludes it from replay.
   The node delivery.ack {up_to_seq} path still advances cumulatively. Regression test.

2. Migration 0019 seeded delivery_ack_seq = MAX(acked seq), skipping an older
   still-queued row below a newer acked one. Seed from the contiguous acked prefix
   (lowest active seq - 1; max seq when nothing is active).

3. Node-replay event classification checked dmType before threadId, so a thread
   reply inside a DM/group DM would replay as dm.received instead of thread.reply
   (the live routes/thread.ts routing). Check threadId first to mirror live.

4. Mailbox depth-cap count included expired-but-unswept rows, so an idle recipient
   kept rejecting new sends as depth_cap after TTL instead of dead-lettering.
   Exclude expired rows from the count (matches the replay query). Regression test.

Also classify the operator-only /v1/workspace/fleet-nodes flag route as non-SDK in
sdk-openapi-sync (pre-existing #194 gap that turbo test caching had masked).
…e/fleet-nodes

The PUT handler rejects a payload lacking both `enabled` (boolean) and
`mode: inherit` with a 400 invalid_request, but the schema marked both optional
and documented only a 200. Add anyOf[required: enabled | required: mode] to
reflect the runtime constraint, and document the 400 (ErrorResponse).
@willwashburn willwashburn merged commit 64d8a93 into main Jun 15, 2026
5 checks passed
@willwashburn willwashburn deleted the feat/fleet-rollout-flag branch June 15, 2026 17:06
willwashburn added a commit that referenced this pull request Jun 15, 2026
Rebased feature/engine-retention onto main, which now has the fleet stack
(#191-#194). Three adaptations:

1. Renumber migration 0016_workspace_retention -> 0020_workspace_retention; #192
   took 0016 (fleet_nodes) and #193 took 0019 (fleet_mailbox).
2. Delivery status model: #193 reworked the enum, so SETTLED_DELIVERY_STATUSES is
   now ['acked','failed','dead_lettered'] (was ['delivered','failed']). 'delivered'
   is now IN-FLIGHT (sent, awaiting cumulative ack), so retention must never prune
   it; 'acked' is terminal success. Updated tests to the new status names.
3. insertDelivery test helper assigns a distinct seq per agent — the mailbox
   migration added UNIQUE(workspace_id, agent_id, seq), so same-agent rows can no
   longer share the default seq 0.

Note: turbo build/tsc is currently red on main itself (engine.ts:212 uses
originInfo.origin_surface, which #188 removed from the telemetry contract) — a
pre-existing #188/#192 collision unrelated to this PR. Engine vitest is green
(132/132).
willwashburn added a commit that referenced this pull request Jun 15, 2026
Rebased feature/engine-retention onto main, which now has the fleet stack
(#191-#194). Three adaptations:

1. Renumber migration 0016_workspace_retention -> 0020_workspace_retention; #192
   took 0016 (fleet_nodes) and #193 took 0019 (fleet_mailbox).
2. Delivery status model: #193 reworked the enum, so SETTLED_DELIVERY_STATUSES is
   now ['acked','failed','dead_lettered'] (was ['delivered','failed']). 'delivered'
   is now IN-FLIGHT (sent, awaiting cumulative ack), so retention must never prune
   it; 'acked' is terminal success. Updated tests to the new status names.
3. insertDelivery test helper assigns a distinct seq per agent — the mailbox
   migration added UNIQUE(workspace_id, agent_id, seq), so same-agent rows can no
   longer share the default seq 0.

Note: turbo build/tsc is currently red on main itself (engine.ts:212 uses
originInfo.origin_surface, which #188 removed from the telemetry contract) — a
pre-existing #188/#192 collision unrelated to this PR. Engine vitest is green
(132/132).
willwashburn added a commit that referenced this pull request Jun 15, 2026
…llow-ups (#189)

* feat(engine): retention pruning with per-workspace TTLs and outbox follow-ups

Add pruneExpired: bounded-batch deletion of expired messages (leaf-first
across thread parents), settled deliveries, message logs, and orphaned
read receipts, with per-workspace TTLs in a new nullable
workspaces.retention column. Message retention is opt-in; settled
deliveries and message logs default to 90 days as operational logs.
Runs on the Node adapter's outbox cleanup cadence and is exported for
queue-backed scheduled handlers.

cleanupOldEvents now settles exhausted pending_events rows as failed so
they become prunable instead of lingering unclaimable, and
sendWebhookEvent skips the outbox insert and queue send entirely (with
a per-request memoized existence probe) for workspaces with no active
event subscription.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* chore(engine): adapt retention to merged fleet engine (rebase onto main)

Rebased feature/engine-retention onto main, which now has the fleet stack
(#191-#194). Three adaptations:

1. Renumber migration 0016_workspace_retention -> 0020_workspace_retention; #192
   took 0016 (fleet_nodes) and #193 took 0019 (fleet_mailbox).
2. Delivery status model: #193 reworked the enum, so SETTLED_DELIVERY_STATUSES is
   now ['acked','failed','dead_lettered'] (was ['delivered','failed']). 'delivered'
   is now IN-FLIGHT (sent, awaiting cumulative ack), so retention must never prune
   it; 'acked' is terminal success. Updated tests to the new status names.
3. insertDelivery test helper assigns a distinct seq per agent — the mailbox
   migration added UNIQUE(workspace_id, agent_id, seq), so same-agent rows can no
   longer share the default seq 0.

Note: turbo build/tsc is currently red on main itself (engine.ts:212 uses
originInfo.origin_surface, which #188 removed from the telemetry contract) — a
pre-existing #188/#192 collision unrelated to this PR. Engine vitest is green
(132/132).

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant