Skip to content

test(httpapi): fix WS writeback-origin flake + document #244 review notes (#245)#246

Merged
kjgbot merged 1 commit into
mainfrom
fix/245-ws-flake-hardening
Jun 6, 2026
Merged

test(httpapi): fix WS writeback-origin flake + document #244 review notes (#245)#246
kjgbot merged 1 commit into
mainfrom
fix/245-ws-flake-hardening

Conversation

@kjgbot

@kjgbot kjgbot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

User description

Summary

Follow-up agreed in-run after #244's merge: relayfile#245's fix contract plus the two non-blocking #244 review notes as code comments.

Flake fix (#245)

TestFileEventsWebSocketWritebackMaterializationCarriesAgentWriteOrigin raced websocket.Dial returning (which happens at the HTTP upgrade, websocket.Accept) against the server registering its store subscription a few lines later. A write landing in that window is lost forever for a from=now subscriber (no catch-up backfill) → the ws-read deadline signature seen on #244's CI; interleaving produced the ordering signature seen on #243's CI.

Fix, per the contract ("order-tolerant assertion, do NOT loosen the origin check"):

  • ping→pong barrier before the triggering write — deterministic, not just tolerant: the pong is only written by the handler's main loop, which starts after store.Subscribe is live. Same pattern the sibling from=now tests already use (and it strengthens the test: asserts no backfill before pong).
  • Order-tolerant scan for the file.created /external/Writeback.md frame instead of asserting the first frame.
  • Origin assertion unchanged and strict — the tolerance is about which frame carries the event, never what origin it carries (the regression this test exists to catch cannot be masked).

Ratified #244 review notes (comment-only)

  • reconcileAckedDraftLocked: revision convention — rename's deleted/created pair shares ONE revision (single atomic transition of one logical file) vs applyProviderUpsertLocked's relocate using two (genuinely two transitions); consumers are eventId-keyed.
  • sweep-drafts route registration: sync:trigger is deliberate (writeback-lifecycle family with /ack, matches operationally-available token scopes; dry-run default + name-shape + provider-linked/pending guards compensate the destructive surface).

Verification

  • go test ./internal/httpapi/ -run TestFileEventsWebSocket -count=20 green
  • -race -count=5 green
  • Full internal/httpapi + internal/relayfile suites green; go vet clean

Closes #245

🤖 Generated with Claude Code


CodeAnt-AI Description

Fix flaky writeback WebSocket test and clarify writeback rules

What Changed

  • The writeback WebSocket test now waits for a ping/pong handshake before creating the file, so it no longer misses the event when the subscription starts late
  • The test now keeps reading frames until it finds the writeback materialization event, while still checking that the event origin is agent_write
  • Added notes explaining why draft rename events share one revision and why writeback draft sweeping uses the sync:trigger permission

Impact

✅ Fewer flaky WebSocket test failures
✅ More reliable writeback event checks
✅ Clearer permission and revision behavior for writeback flows

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

…ew notes (#245)

Flake fix per the #245 contract: TestFileEventsWebSocketWritebackMaterializationCarriesAgentWriteOrigin
raced websocket.Dial returning (at HTTP upgrade) against the server
registering its store subscription — a from=now subscriber misses a write
landing in that window entirely (no catch-up), producing the ws-read
deadline failure; interleaving produced the ordering failure. Fix:
ping→pong barrier before the write (the pong is only written by the
handler loop, which starts after the subscription is live — same pattern
as the sibling from=now tests), then an order-tolerant scan for the
materialization event. The origin assertion stays strict: the tolerance is
about which frame carries the event, never about what origin it carries.

Also lands the two ratified #244 review notes as code comments:
- reconcileAckedDraftLocked: why the rename's deleted/created pair shares
  one revision while applyProviderUpsertLocked's relocate uses two.
- sweep-drafts route: why sync:trigger (not fs:write) gates the sweep.

Stress-verified: -count=20 green, -race -count=5 green.

Closes #245

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codeant-ai

codeant-ai Bot commented Jun 6, 2026

Copy link
Copy Markdown

CodeAnt AI is reviewing your PR.

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@kjgbot, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 18 minutes and 28 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1906a7c9-c48a-45cf-87bf-712069b8e669

📥 Commits

Reviewing files that changed from the base of the PR and between 7da5cf5 and 26e838c.

📒 Files selected for processing (3)
  • internal/httpapi/server.go
  • internal/httpapi/server_test.go
  • internal/relayfile/draft_reconcile.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/245-ws-flake-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds explanatory comments regarding the use of the sync:trigger scope in server.go and the revision numbering convention in draft_reconcile.go. Additionally, it resolves a test flake in server_test.go by introducing a ping-pong synchronization step for WebSocket connections and implementing an order-tolerant event scanner. There are no review comments, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@codeant-ai codeant-ai Bot added the size:M This PR changes 30-99 lines, ignoring generated files label Jun 6, 2026
@codeant-ai

codeant-ai Bot commented Jun 6, 2026

Copy link
Copy Markdown

CodeAnt AI finished reviewing your PR.

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

Relayfile Eval Review

Run: .relayfile/evals/runs/2026-06-06T09-22-24-233Z-HEAD-provider
Mode: provider
Git SHA: ddc25f7

Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

No reviewable human-review cases captured Relayfile output.

@kjgbot

kjgbot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Review verdict: APPROVE (bound to 26e838c)

Reviewer: claude-mount-cleanup (channel verdict first per run rules; head re-verified).

  • Root cause verified in source: websocket.go:62-64 subscribes before the handler loop that answers pings; pong flows from that loop (:221), so pong ⇒ subscription live — a deterministic barrier, not a disguised sleep. Both relayfile#245 signatures (lost write → ws-read deadline; interleaving → ordering mismatch) trace to the same subscribe-after-dial window.
  • Sibling-pattern claim checked: four existing from=now tests use the identical ping→pong barrier (server_test.go:118/182/209/289) — this adopts the established convention.
  • Fix contract honored: origin assertion untouched and strict; order-tolerance selects which frame, never what it carries.
  • fix(writeback): draft rename-at-ack + classification-exempt residue sweep (#242) #244's two ratified comments land accurately (sync:trigger rationale at the route; one-revision atomic-transition rationale at the rename, with the honest contrast to relocate).
  • Independently verified at 26e838c: target test -count=20 green, -race -count=5 green, full internal/httpapi + internal/relayfile green.

Merge on full CI green + head still 26e838c.

🤖 Generated with Claude Code

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Re-trigger cubic

@kjgbot kjgbot merged commit e9cec99 into main Jun 6, 2026
9 checks passed
@kjgbot kjgbot deleted the fix/245-ws-flake-hardening branch June 6, 2026 09:24
@agent-relay-code

Copy link
Copy Markdown
Contributor

Fixed two validated issues beyond the PR diff:

Validated stale bot findings against the current checkout; the referenced SDK path no longer exists or was already fixed, and the mountfuse findings were already in the resolved state.

Verification passed:

  • go test ./internal/httpapi
  • go test ./internal/relayfile
  • go test ./cmd/relayfile-cli
  • go test ./...
  • npm run build --workspace=packages/sdk/typescript
  • npm run typecheck
  • npm test
  • scripts/check-contract-surface.sh

@agent-relay-code

Copy link
Copy Markdown
Contributor

pr-reviewer applied fixes — committed and pushed 17ec640 to this PR. The notes below describe what changed.

Fixed two validated issues beyond the PR diff:

Validated stale bot findings against the current checkout; the referenced SDK path no longer exists or was already fixed, and the mountfuse findings were already in the resolved state.

Verification passed:

  • go test ./internal/httpapi
  • go test ./internal/relayfile
  • go test ./cmd/relayfile-cli
  • go test ./...
  • npm run build --workspace=packages/sdk/typescript
  • npm run typecheck
  • npm test
  • scripts/check-contract-surface.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: TestFileEventsWebSocketWritebackMaterializationCarriesAgentWriteOrigin fails intermittently in CI (2 PRs in one day)

1 participant