Skip to content

feat(sdk): resync missed events on WebSocket reconnect#176

Merged
willwashburn merged 4 commits into
mainfrom
feature/sdk-ws-resync
Jun 10, 2026
Merged

feat(sdk): resync missed events on WebSocket reconnect#176
willwashburn merged 4 commits into
mainfrom
feature/sdk-ws-resync

Conversation

@willwashburn

@willwashburn willwashburn commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

The server side already has a complete reconnect-replay protocol — every event pushed to an agent is stamped with a monotonic agent_seq, each agent has a 500-event resync ring, and gaps beyond the ring get a DB-backed replay (engine/src/adapters/node/realtime.ts, engine/src/engine/resyncQuery.ts, mirrored by the cloud AgentDO). No shipping client used it, so every WS disconnect window was silent event loss for SDK users. This wires the TypeScript SDK into that protocol.

Changes

packages/sdk-typescript

  • WsClient tracks the highest agent_seq seen across incoming events. The seq is read from the raw frame because the zod event schemas strip unknown keys.
  • After each reconnect — once open handlers have re-subscribed — the client sends {type: "resync", last_seen_seq, since}. First connections (no seq yet) send nothing and behave exactly as before.
  • Replayed events go through the normal dispatch path, deduplicated by stable event id (LRU of 2048, sized above the 500-event ring plus DB-replay batches), so handlers never see duplicates. Dedupe is id-based rather than seq-based on purpose: DB-fallback replay events carry no agent_seq, and seq comparison would silently drop live events after a server counter reset.
  • The server's resync_ack surfaces as a new resynced lifecycle event (alongside reconnecting / permanently_disconnected), exposed as on.resynced(({ replayed, gapDetected }) => ...) on both RelayCast and AgentClient.
  • New package README: install, RelayCast vs AgentClient quickstart, reconnect/resync behavior, self-hosting.

packages/types (missing wire frame schemas only)

  • ResyncEventSchema (client → server) added to ClientEventSchema.
  • ResyncAckEventSchema (server → client) added to ServerEventSchema.
  • WsResyncedEventSchema (client-emitted) added to WsClientEventSchema.

Wire contract (confirmed against Node adapter and cloud AgentDO)

  • Client → server: {"type": "resync", "last_seen_seq": <number>, "since": <ISO timestamp>}
  • Server → client: ring replay (original payloads, each carrying agent_seq), then DB-fallback replay when the gap exceeds the ring (events tagged "replayed": true, no agent_seq), then {"type": "resync_ack", "last_seen_seq", "current_seq", "replayed", "gap_detected"}

Tests

New unit tests in ws.test.ts (mock-WS harness) and agent-ws.test.ts:

  • no resync frame on first connection, and none on reconnect when no events were seen
  • agent_seq tracked (including from unrecognized event types) and resync sent with the correct last_seen_seq + ISO since
  • resync frame ordered after open-handler resubscription
  • replayed events dispatched once (stable-id dedupe), new replayed events delivered
  • resynced emitted with {replayed, gapDetected} from resync_ack; AgentClient.on.resynced end to end
  • seq cursor advances across multiple reconnects

Full workspace npx turbo build test and lint pass (SDK: 19 files / 364 tests, engine: 55, types: 44).

🤖 Generated with Claude Code

The engine already stamps every delivered event with a monotonic
agent_seq, keeps a 500-event resync ring per agent, and falls back to a
DB-backed replay for larger gaps — but no shipping client used it, so
every disconnect window was silent event loss.

WsClient now tracks the highest agent_seq seen (read from the raw frame,
since schema parsing strips unknown keys) and, after each reconnect once
open handlers have re-subscribed, sends
{type: "resync", last_seen_seq, since}. Replayed events flow through the
normal dispatch path, deduplicated by stable event id, and the server's
resync_ack surfaces as a new "resynced" lifecycle event — exposed as
on.resynced(({replayed, gapDetected}) => ...) on RelayCast and
AgentClient. First connections behave exactly as before (no seq, no
resync frame).

@relaycast/types gains the missing wire frame schemas: resync (client),
resync_ack (server), and the client-only resynced event.

Also adds the package README (install, RelayCast vs AgentClient
quickstart, reconnect/resync behavior, self-hosting).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@codeant-ai

codeant-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 47b848e4-a4b7-4001-b862-bd05e3967f0b

📥 Commits

Reviewing files that changed from the base of the PR and between 311149a and 5b88e4b.

📒 Files selected for processing (12)
  • .agentworkforce/trajectories/completed/2026-06/traj_6jkl93zpg343/summary.md
  • .agentworkforce/trajectories/completed/2026-06/traj_6jkl93zpg343/trajectory.json
  • memory/workspace/.relay/state.json
  • packages/sdk-typescript/CHANGELOG.md
  • packages/sdk-typescript/README.md
  • packages/sdk-typescript/src/__tests__/agent-ws.test.ts
  • packages/sdk-typescript/src/__tests__/ws.test.ts
  • packages/sdk-typescript/src/agent.ts
  • packages/sdk-typescript/src/relay.ts
  • packages/sdk-typescript/src/types.ts
  • packages/sdk-typescript/src/ws.ts
  • packages/types/src/events.ts

📝 Walkthrough

Walkthrough

This PR implements a complete WebSocket reconnect resynchronization protocol for the TypeScript SDK. Clients now track server-issued sequence numbers, send resync requests after reconnects to replay missed events, deduplicate replays by stable event id, and emit a resynced lifecycle event. The feature is tested comprehensively and documented in README and changelog.

Changes

SDK WebSocket Reconnect Resync Protocol

Layer / File(s) Summary
Event schemas and type contracts
packages/types/src/events.ts, packages/sdk-typescript/src/types.ts
New Zod schemas define ResyncEvent (client-to-server), ResyncAckEvent (server-to-client), and WsResyncedEvent (client-emitted). These are wired into discriminated unions for ClientEventSchema, ServerEventSchema, and WsClientEventSchema. TypeScript type aliases expose ResyncAckEvent and WsResyncedEvent in the SDK.
WsClient resync tracking and deduplication
packages/sdk-typescript/src/ws.ts
WsClient now tracks agent_seq from incoming events, maintains a bounded LRU cache (SEEN_EVENT_IDS_MAX=2048) of seen event ids, sends resync frames after reconnect with last_seen_seq and optional since timestamp (no-op on initial connect), intercepts resync_ack to emit resynced events, and drops replayed events already in the dedup cache.
RelayCast and AgentClient on.resynced handlers
packages/sdk-typescript/src/relay.ts, packages/sdk-typescript/src/agent.ts
Both client classes expose an on.resynced(handler) method that wires the underlying WebSocket resynced event to a user callback normalized to { replayed: number, gapDetected: boolean }.
Resync behavior test suite
packages/sdk-typescript/src/__tests__/ws.test.ts, packages/sdk-typescript/src/__tests__/agent-ws.test.ts
Test suite validates no resync on initial connect, seq tracking with last_seen_seq in resync frames, resync timing (after open handlers resubscribe), deduplication by stable event id, resynced event emission with replay statistics, multi-reconnect seq advancement, and integration with AgentClient handlers.
Changelog, README, and trajectory records
packages/sdk-typescript/CHANGELOG.md, packages/sdk-typescript/README.md, .agentworkforce/trajectories/completed/2026-06/traj_6jkl93zpg343/*
Changelog documents new reconnect/resync mechanics and resynced event. README covers SDK installation, quickstart, event handling, lifecycle callbacks, reconnect/resync semantics, and self-hosting. Trajectory records task completion with decisions (seq capture before validation, id-based dedup vs seq comparison, resync after open event for handler ordering) and implementation coverage.

Sequence Diagram

sequenceDiagram
  participant Client as WsClient
  participant Server
  participant App as Application
  
  rect rgba(100, 200, 150, 0.5)
  Note over Client,App: Initial Connection
  Client->>Server: WebSocket connect
  Server->>Client: open event
  Client->>App: emit open event
  App->>Client: subscribe handlers
  Note over Client: No resync sent (no prior events)
  end
  
  rect rgba(100, 150, 200, 0.5)
  Note over Client,App: Event Delivery & Disconnect
  Server->>Client: message.created (agent_seq: 42)
  Client->>Client: track agent_seq: 42
  Client->>App: emit message event
  App->>Client: (socket close)
  Note over Client: Socket closed
  end
  
  rect rgba(200, 150, 100, 0.5)
  Note over Client,App: Reconnect Resync Flow
  Client->>Server: WebSocket reconnect
  Server->>Client: open event
  Client->>App: emit open event
  App->>Client: subscribe handlers
  Client->>Server: send resync {last_seen_seq: 42, since: timestamp}
  Note over Server: Check for missed events between seq 42 and now
  Server->>Client: resync_ack {replayed: 2, gapDetected: false}
  Client->>Client: emit resynced event
  Client->>App: on.resynced({replayed: 2, gapDetected: false})
  Server->>Client: event_1 (replay, id: msg_123)
  Server->>Client: event_2 (new, id: msg_456)
  Client->>Client: deduplicate msg_123 (seen before)
  Client->>App: emit only msg_456
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A WebSocket springs back to life,

Reconnecting through strife,

Tracking sequences with care,

Replayed events handled fair,

No duplicates to cause strife! 🌟

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/sdk-ws-resync

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai

codeant-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9e2dcee08

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// `agent_seq` (stripped by schema parsing, so read it raw here).
if (typeof parsed.agent_seq === 'number' && Number.isFinite(parsed.agent_seq)) {
this.lastSeenSeq = parsed.agent_seq;
this.lastEventAt = new Date().toISOString();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use a server-side cursor for DB replay

When the resync gap exceeds the 500-event ring, the server falls back to replayMissedEvents, which filters persisted messages/reactions with created_at > floor(since). This sets since from the SDK host's current receive time rather than from a server event timestamp or sequence-safe lower bound, so a client clock that is ahead of the server—or a last event received near the end of a second followed by more than 500 missed events in that same second—can cause the DB fallback to skip missed rows outside the ring and report the resync as complete. Use a server-derived timestamp/cursor, or otherwise bias since safely before the last seen event.

Useful? React with 👍 / 👎.

@codeant-ai

codeant-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI.

@willwashburn willwashburn merged commit 2011a9e into main Jun 10, 2026
4 of 5 checks passed
@willwashburn willwashburn deleted the feature/sdk-ws-resync branch June 10, 2026 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant