feat(engine): accept node roster snapshot in node.heartbeat#197
Conversation
The node.heartbeat schema was .strict() and accepted only load/active_agents/handlers_live. The relay broker wants to carry the node roster (name/node_id/capabilities/max_agents/version) on the steady-state heartbeat so the engine can keep a node's descriptor fresh between — or in the absence of — a fresh node.register (e.g. after an engine restart where the broker keeps heartbeating an already-registered node). Extend FleetNodeHeartbeatMessageSchema with optional roster fields and have heartbeatNode() adopt them: refresh name/capabilities/max_agents/ version on the node row and register newly-advertised capability actions via ensureCapabilityActions. A minimal heartbeat (no roster) remains valid and preserves the existing roster. last_heartbeat_at is NOT part of the wire: receipt time is stamped server-side here as the single source of truth for liveness; the broker does not send it. node_id, when present, is validated against the authenticated node token (node_id_mismatch) like node.register. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthrough
ChangesRoster snapshot on node.heartbeat
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2f685fa99a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (updated && message.capabilities !== undefined) { | ||
| await ensureCapabilityActions(db, workspaceId, updated.id, message.capabilities); |
There was a problem hiding this comment.
Reconcile stale actions when heartbeat drops capabilities
When a roster heartbeat sends a capabilities list that no longer includes an action this node previously advertised, this block only creates/reactivates actions for the new list and never clears the old node-handled action rows. Since invokeAction dispatches by the stored handlerNodeId without rechecking the node's current capabilities, /v1/actions/<removed>/invoke can keep sending unsupported work to this node after its heartbeat roster says it no longer supports that capability. Please deactivate or reassign stale node actions when adopting the heartbeat roster.
Useful? React with 👍 / 👎.
What
Extends the
node.heartbeatfleet-wire schema so the broker can carry the node roster snapshot (name,node_id,capabilities,max_agents,version) on the steady-state heartbeat, and has the engine adopt it.Previously
FleetNodeHeartbeatMessageSchemawas.strict()and accepted onlyload/active_agents/handlers_live. The relay broker (relay#1139, branchricky/factory-p11-broker-heartbeat) wants the heartbeat to carry the roster for liveness so the engine keeps a node's descriptor fresh between — or in the absence of — a freshnode.register(e.g. after an engine restart where the broker keeps heartbeating an already-registered node). With the strict schema, every roster-carrying heartbeat was rejected, so nodeload/active_agentsnever updated and the Fleet E2E spawn/load scenarios timed out.Changes
packages/types/src/fleet-wire.ts: add optionalname/node_id/capabilities/max_agents/versiontoFleetNodeHeartbeatMessageSchema(still.strict(), still backward-compatible with a minimal heartbeat).packages/engine/src/engine/node.ts(heartbeatNode): when roster fields are present, refreshname/capabilities/maxAgents/versionon the node row and register newly-advertised capability actions viaensureCapabilityActions. Validatenode_idagainst the authenticated token (node_id_mismatch) likenode.register.packages/types/fixtures/fleet-wire/node.heartbeat.json: fixture now carries the roster (round-tripped by the wire-contract test).node_idmismatch rejection, and minimal-heartbeat preservation.Single source of truth:
last_heartbeat_atlast_heartbeat_atis intentionally not part of the wire. Receipt time is stamped server-side inheartbeatNodeas the single source of truth for liveness; the broker does not send it.Verification (local)
npm run build --workspace @relaycast/types/@relaycast/engine: clean.fleetRolloutFlag).tsc --noEmit: clean;lint: clean (one pre-existing unrelated warning indelivery.ts).Dependency note
relay #1139's Fleet E2E is being repointed at this branch's commit so CI runs the broker's roster-in-heartbeat feature against an engine that accepts it. Do not merge until that cross-repo verification is reviewed.
🤖 Generated with Claude Code