Skip to content

Validate broker events at ingress with zod schemas#214

Merged
willwashburn merged 3 commits into
mainfrom
audit/zod-event-ingress
Jun 10, 2026
Merged

Validate broker events at ingress with zod schemas#214
willwashburn merged 3 commits into
mainfrom
audit/zod-event-ingress

Conversation

@willwashburn

Copy link
Copy Markdown
Member

M2.1 — zod validation of broker events at ingress

Validates broker events once, at the boundary where they enter BrokerManager (HarnessDriverClient.onEvent), using a zod discriminated union that mirrors the SDK's BrokerEvent type.

Schema design (src/shared/schemas/broker-events.ts)

  • Discriminator is kind. The audit brief said "discriminated union on name", but the SDK union — and every consumer in broker.ts / agent-store.ts — switches on kind (name is a payload field on most variants, absent on others like relay_inbound/relaycast_*). Discriminating on kind is what the code actually reads.
  • Membership = the full known SDK union. Covers every kind the app forwards (PTY/worker_stream chunks, agent lifecycle/spawn/exit, relay_inbound, delivery confirmations/retries/drops, idle/blocked status, channel sub/unsub, restarts, etc.). publishBrokerEvent forwards all events to the renderer's broker:event stream (not just the subset handleBrokerEvent acts on), so the "known" set must be the whole union — otherwise valid SDK events would trip the unknown-kind telemetry.
  • .passthrough() on every payload. The @agent-relay/sdk adds fields between minors, and the duplicate-suppression logic reads fields dynamically (seq, event_id, id, chunk). Stripping would break dedupe and drop forward-compatible data.
  • Enum-ish fields stay loose z.string() (runtime/provider/delivery mode) so a new enum value shipped in an SDK minor can't drop an otherwise-valid event.

Drop-vs-forward policy (classifyBrokerEvent)

  • Known kind, payload OK → valid: forwarded as the parsed (passthrough) payload.
  • Known kind, payload bad (or no usable kind) → malformed: dropped, with a per-kind throttled warning (60s). The event stream keeps running.
  • Unknown kind → forwarded unchanged, with a once-per-kind warning. We never drop unknown names (forward-compat: a future SDK minor may add kinds this build doesn't model).

Throttling/once-per-kind follows the AGENTS.md low-noise telemetry doctrine.

Casts removed

  • broker.ts: the event as unknown as BrokerEventRecordPayload cast at the publish boundary is gone — validateIngressBrokerEvent returns the typed parsed payload, which is assignable to BrokerEventRecordPayload (passthrough output is record-compatible). The general-utility dynamic accessors (brokerEventString et al., the PR Fix all node-side type errors and make typecheck:node a blocking gate #204 pattern) are left untouched.
  • agent-store.ts (renderer): event.name! / event.parent! non-null assertions replaced with const-narrowed locals destructured at the top of handleBrokerEvent (a && name guard then narrows them to string through the set(...) closures). Renderer change kept minimal — main is the validation point.

Performance

Valid worker_stream chunks keep their existing cheap inline-typed fast path (typeof name/chunk === 'string'); the zod parse runs only on the general path, so typing latency never pays a discriminated-union parse per keystroke. A malformed worker_stream falls through the inline guard to the zod boundary and is dropped there.

Tests (broker.test.ts)

  • Malformed known event (relay_inbound missing body) is dropped + logged, and a subsequent valid event still flows (stream not torn down).
  • Malformed warnings are throttled to once per kind.
  • Unknown kind is forwarded (both copies reach broker:event) and logged exactly once.
  • Valid events of each major shape (agent_spawned, relay_inbound, agent_idle, delivery_queued, worker_streambroker:pty-chunk) parse and flow.
  • classifyBrokerEvent unit tests: passthrough field preservation, wrong-type rejection, unknown-vs-malformed, no-kind handling, known-kind membership.

Verification (all blocking gates green)

  • npm run typecheck
  • npm run lint ✅ (0 errors; pre-existing baseline warnings only)
  • npm run test:all ✅ (115 node tests, 298 Vitest tests)
  • npm run build

🤖 Generated with Claude Code

Add a zod discriminated union (keyed on `kind`, matching the SDK's
`BrokerEvent` type) for every broker event the app forwards, and validate
events ONCE where they enter `BrokerManager` via `HarnessDriverClient.onEvent`.

- src/shared/schemas/broker-events.ts: discriminated union of all known event
  kinds. Every payload uses `.passthrough()` so SDK-minor additions and the
  dedupe logic's dynamic field reads (`seq`, `event_id`, `id`, `chunk`) survive.
  Enum-ish fields (runtime/provider/mode) stay loose `z.string()` so a new enum
  value can't drop an otherwise-valid event. `classifyBrokerEvent` returns
  valid / unknown / malformed.
- broker.ts ingress: known kind that fails the schema is dropped with a
  per-kind throttled warning; unknown kind is forwarded unchanged with a
  once-per-kind warning (forward-compat). Valid worker_stream chunks keep their
  cheap inline-typed fast path, so we never run a discriminated-union parse per
  keystroke. Removes the `as unknown as BrokerEventRecordPayload` cast at the
  publish boundary in favor of the typed parsed payload.
- agent-store.ts: replace `event.name!` / `event.parent!` non-null assertions
  with const-narrowed locals (main is the validation point; renderer change is
  minimal).
- broker.test.ts: malformed known event dropped + logged without killing the
  stream; warning throttled per kind; unknown kind forwarded + logged once;
  valid events of each major shape flow; classifyBrokerEvent unit tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@willwashburn, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 4 minutes and 22 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1d92b376-f79c-481a-be49-b374cdd4bdaf

📥 Commits

Reviewing files that changed from the base of the PR and between a98440b and bb6b8ca.

📒 Files selected for processing (4)
  • src/main/broker.test.ts
  • src/main/broker.ts
  • src/renderer/src/stores/agent-store.ts
  • src/shared/schemas/broker-events.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch audit/zod-event-ingress

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Zod-based schema validation for broker events at the main-process ingress, dropping malformed known events while preserving forward-compatibility for unknown event kinds. It also refactors the renderer's agent store to clean up event destructuring and avoid non-null assertions. The review feedback suggests improving the Zod error reporting in describeIssue to aggregate and display all validation errors instead of just the first one.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/shared/schemas/broker-events.ts Outdated
Comment on lines +406 to +409
const issue = error.issues[0]
if (!issue) return 'invalid event shape'
const path = issue.path.join('.')
return path ? `${path}: ${issue.message}` : issue.message

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The describeIssue function currently only reports the first validation error from Zod. This can make debugging more difficult if an event payload has multiple issues.

To provide more comprehensive error details, consider modifying this function to report all validation issues. This will be more informative when a malformed event is dropped.

Suggested change
const issue = error.issues[0]
if (!issue) return 'invalid event shape'
const path = issue.path.join('.')
return path ? `${path}: ${issue.message}` : issue.message
if (error.issues.length === 0) return 'invalid event shape'
return error.issues
.map((issue) => (issue.path.length ? `${issue.path.join('.')}: ${issue.message}` : issue.message))
.join('; ')

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in f3dbdf3 — all issues joined with '; ', path-prefixed. broker.test.ts 74/74 green.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45f245b7c0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/shared/schemas/broker-events.ts Outdated
.object({
kind: z.literal('delivery_injected'),
name: z.string(),
delivery_id: z.string(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept delivery events without delivery_id

When the broker emits delivery confirmations with only event_id and name (the shape already handled by sendMessageAndWaitForInjected/sendMessageAndWaitForDelivery and used by the MockClient's delivery_injected events), this required delivery_id makes classifyBrokerEvent mark the event malformed, so attachClient drops it before broker:event reaches the renderer. In that scenario the UI never observes delivery_injected/delivery_ack and pending-delivery/activity state is not cleared; make delivery_id optional for these delivery status events or otherwise align the schema with the event shape the rest of the broker code accepts.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid finding — fixed in bb6b8ca. delivery_id is now optional across all delivery variants (injected/verified/failed/ack/active/confirmed); the SDK protocol declares it required, but the app's own delivery logic keys on event_id + name only, so the schema being stricter than every consumer added drop risk with zero benefit (consistent with this PR's loose-enum principle). Regression test added covering each delivery kind without delivery_id — 75/75 green.

willwashburn and others added 2 commits June 10, 2026 10:21
Gemini review: describeIssue only surfaced the first issue, hiding
multi-field payload problems from the malformed-event telemetry.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The app's delivery logic (isDeliveryEventForMessage) keys on event_id +
name only; requiring delivery_id was stricter than any consumer and
would silently drop confirmations from brokers that omit it, leaving
pending-delivery UI state stuck. Regression test covers all delivery
kinds without delivery_id.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@willwashburn willwashburn merged commit f4930cb into main Jun 10, 2026
5 checks passed
@willwashburn willwashburn deleted the audit/zod-event-ingress branch June 10, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant