Skip to content

test: terminal output fidelity regression net (no duplication under load)#176

Merged
kjgbot merged 2 commits into
mainfrom
claude/duplication-fidelity
Jun 9, 2026
Merged

test: terminal output fidelity regression net (no duplication under load)#176
kjgbot merged 2 commits into
mainfrom
claude/duplication-fidelity

Conversation

@miyaontherelay

@miyaontherelay miyaontherelay commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a renderer fidelity regression net for the reported "when a lot of messages come in quickly it starts to repeat" issue.

  • Adds window.__pearMock.getTerminalBufferText(projectId, name) so Playwright can inspect the live xterm buffer text through the mock harness.
  • Adds npm run test:fidelity, backed by a dedicated Playwright config on port 4175.
  • Adds six final-state PTY fidelity variants that assert canonical injected markers match rendered terminal markers exactly: count, no duplicates, strict order, first marker present, final marker present.

Automated Fidelity Suite Results

npm run test:fidelity passed: 6/6.

Variant Result What It Would Surface
High-rate stream Pass Direct renderer PTY duplication/loss/order issues under 1,000 unique chunks.
Tab-switch during stream Pass Snapshot replay race or stale tab listener replay when returning to the perturbed agent.
Split-pane transition during stream Pass Split/layout remount disturbance, second-pane attach missed chunks, or cross-pane duplicated tails. Both split-0001 and split-0002 are asserted independently.
Window resize during stream Pass Resize-triggered xterm/runtime replay or reordered PTY drain.
Window focus events during stream Pass Focus/visibility paths that could re-trigger DECSET/focus-redraw-style duplicated output.
Runtime remount mid-stream Pass Stale mount-token/runtime replay after graph/terminal remount.

The suite uses a finite 30s final-state drain deadline. Failure diagnostics include elapsed time, deadline, canonical count, rendered count, duplicate samples, and first missing marker samples.

Findings

Renderer-layer final-state duplication did not reproduce in the automated fidelity suite.

During calibration, high-rate synthetic bursts showed xterm drain lag but not duplication, loss, or reordering:

  • Baseline high-rate case: 1,000 canonical markers / 762 rendered markers after a 10s settle, contiguous tail missing, no duplicates.
  • Split-pane calibration run: 700 canonical markers / 604 rendered markers after a 15s bounded retry for split-0001, contiguous tail missing, no duplicates.
  • With the final-state drain deadline, these cases fully drain and pass exact equality.

This PR does not claim to fix drain throughput. The drain-lag measurements are documented as follow-up perf characterization, not a fidelity failure.

Exploratory Coverage Matrix

dup-repro independently ran a mock/web exploratory probe on current main with zero duplicate IDs observed:

Area Variation Parameters Expected Observed
PTY Visible no-yield burst 10k unique lines, one injectPtyChunk no duplicate line IDs in retained xterm scrollback pass; 3044 retained IDs, dup=false
PTY Tab switch away/back 1k lines, 20/batch, 1ms, switch agent-0002 then back final 1k IDs, in order/no dup pass; 1000 IDs, dup=false
PTY Graph remount 1k lines while toggling graph on/off final 1k IDs, no dup pass
PTY Split remount 1k lines while toggling split on/off final 1k IDs, no dup pass
PTY Hidden background before first mount 1k lines to agent-0003 before selecting it final 1k IDs after mount, no dup pass
PTY Background during graph remount 1k lines to agent-0004 while graph toggles final 1k IDs after mount, no dup pass
PTY Multi-agent tab switching 3 agents x 1k lines, switch active tab midstream sampled active agent final 1k IDs, no dup pass
PTY Focus/blur/click-out 1k lines plus terminal click, window blur/focus events, click out final 1k IDs, no dup pass
PTY Viewport resize 1k lines during 2 viewport size changes final 1k IDs, no dup pass
PTY Late agent spawn 1k lines while spawning 6 more agents final 1k IDs, no dup pass
PTY Wide-ish payloads 1k lines with suffix text final 1k IDs, no dup pass
PTY Rapid tab remount loop 2k lines, 10 away/back cycles final 2k IDs, no dup pass
PTY Graph remount loop 2k lines, 5 graph on/off cycles final 2k IDs, no dup pass
PTY Split remount loop 2k lines, 5 split on/off cycles final 2k IDs, no dup pass
PTY Background mounted while live 2k lines to hidden agent, select it midstream final 2k IDs, no dup pass
PTY Hidden no-yield burst before mount 10k unique lines to hidden agent-0004, then mount no duplicate retained IDs pass; 3020 retained IDs, dup=false
PTY Long wrapped lines 1k lines with long wrapping suffix final 1k IDs, no dup pass
PTY ANSI colored lines 1k synchronous \x1b[31m...\x1b[0m lines final 1k IDs, no dup pass
PTY Resize churn 2k lines during 8 rapid viewport resizes final 2k IDs, no dup pass
PTY 4 split panes simultaneous 4 agents x 2k lines in split mode each pane final 2k IDs, no dup pass for all 4 panes
Chat Single-agent inbound burst 1k tight relay_inbound events from agent-0001 to #general, unique event_id/seq source IDs unique; visible chat DOM IDs unique and known pass; sourceIds=1000, visibleIds=21, duplicateVisibleIds=0, unknownVisibleIds=0

Harness Boundary

Covered by mock/web harness:

  • Renderer PTY final fidelity after drain.
  • Remount/layout/focus/resize interactions.
  • Background-before-mount buffering.
  • Split-pane final fidelity.
  • Visible chat DOM identity under a 1k inbound burst.

Not covered by mock/web harness:

  • Real Electron main-process worker_stream -> broker:pty-chunk delivery.
  • Upstream SDK replay behavior.
  • Real PTY/TUI focus-redraw behavior.
  • Production broker events that lack stable identity.
  • Idle-dispose remount path in the lazy-mount registry. The runtime-remount variant exercises layout-driven unmount/remount via graph/tabs toggle; the 5-minute idle-dispose timer would need a test-only override and is deferred.

dup-repro also code-read src/main/broker.ts and found the real main dispatch path already has generation-gated listeners, reconnect resume from lastEventSeq, and isDuplicatePtyChunk() dedupe by event_id / id / seq with content fallback.

Follow-Up

If the user reproduces duplication again in the real app path, add low-noise production counters in isDuplicatePtyChunk():

  • suppressedByIdentity
  • suppressedByContent
  • missingIdentityCount

Separately, the xterm drain-lag measurements above are a candidate perf follow-up. A runtime join-write batching experiment may be worth evaluating later, but it is intentionally not part of this final-state fidelity PR.

Validation

  • npx tsc --noEmit -p tsconfig.web.json
  • npx vitest run (17 files, 236 tests)
  • npm run build
  • npm run test:fidelity (6 passed)
  • npm run test:stress (2 passed; pty-heavy stress min FPS ~46-47, no console errors)

@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@codeant-ai

codeant-ai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@kjgbot, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 51 minutes and 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c57f81d5-73b1-4475-bc87-7044db0762e0

📥 Commits

Reviewing files that changed from the base of the PR and between d511a22 and bac3eaa.

📒 Files selected for processing (5)
  • package.json
  • playwright.fidelity.config.ts
  • src/renderer/src/lib/ipc-mock.ts
  • src/renderer/src/lib/terminal-runtime-registry.ts
  • tests/playwright/fidelity-no-duplication.spec.ts
📝 Walkthrough

Walkthrough

This PR introduces a Playwright-based fidelity test suite that validates terminal buffer rendering correctness under high-rate PTY streaming. It adds test infrastructure (test:fidelity script, playwright.fidelity.config.ts), mock harness APIs to read live buffer state, and a comprehensive test suite with stress scenarios involving UI/runtime disruptions.

Changes

Terminal fidelity testing infrastructure

Layer / File(s) Summary
Test script and Playwright configuration
package.json, playwright.fidelity.config.ts
Registers test:fidelity npm script chaining build:web and Playwright execution; configures Playwright with timeouts, Vite preview server, Chrome desktop, trace-on-failure, and list reporter.
Mock harness terminal buffer API
src/renderer/src/lib/terminal-runtime-registry.ts, src/renderer/src/lib/ipc-mock.ts
Exports getTerminalRuntime(key) to resolve active TerminalRuntime instances by agent key; adds getTerminalBufferText(projectId, name) method to PearMockHarness that converts live terminal buffer lines to newline-delimited text.
Fidelity test suite and helpers
tests/playwright/fidelity-no-duplication.spec.ts
Implements test suite with helpers (bootWithAgents, startStream, readMarkerStats, waitForFinalMarkerStats, expectFidelity) that inject deterministic marker chunks, poll buffer state until expected counts match, and assert no missing or duplicate markers under tab switches, split-pane transitions, viewport resizes, focus/visibility events, and runtime remounts.

Sequence Diagram

sequenceDiagram
  participant TestCase
  participant bootWithAgents
  participant startStream
  participant PearMockHarness
  participant expectFidelity
  TestCase->>bootWithAgents: configure terminal layout, spawn agents
  bootWithAgents-->>TestCase: agents running
  TestCase->>startStream: inject marker PTY chunks for streams
  startStream->>PearMockHarness: write chunks at specified indices
  startStream-->>TestCase: stream injection complete
  TestCase->>expectFidelity: verify marker counts and ordering
  expectFidelity->>PearMockHarness: readMarkerStats (buffer + canonical)
  PearMockHarness-->>expectFidelity: missing, duplicates, counts
  expectFidelity-->>TestCase: assert no missing/duplicates, order valid
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Markers march through PTY streams,
Racing fast in fidelity dreams,
No duplicates hiding, no losses in sight,
The buffer renders crisp and right!
Through tabs and splits and resize spree,
The terminal stands true, wild and free! 🎪

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding a regression test suite (fidelity net) for terminal output duplication under high message load.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the fidelity test suite, mock harness additions, test results, and findings.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/duplication-fidelity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/playwright/fidelity-no-duplication.spec.ts (1)

105-105: 💤 Low value

Consider case-insensitive marker pattern for robustness.

The regex pattern /\[[a-z0-9-]+-\d{4}\]/g only matches lowercase alphanumeric characters in marker prefixes. All current test markers use lowercase prefixes (chunk, tab-a, split-a, etc.), so this works today.

However, if future tests use uppercase characters in marker prefixes, they won't be detected. Consider using /\[[a-zA-Z0-9-]+-\d{4}\]/gi for forward compatibility.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/playwright/fidelity-no-duplication.spec.ts` at line 105, The regex
stored in markerPattern only matches lowercase prefixes; update the pattern used
in the markerPattern constant to be case-insensitive and include uppercase
letters (e.g., allow A-Z) or add the i flag so markers like "Chunk" or "TAB-A"
are matched; modify the declaration of markerPattern to use a pattern such as
/\[[a-zA-Z0-9-]+-\d{4}\]/i (or add the /i flag to the existing class) so tests
detect markers regardless of casing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/playwright/fidelity-no-duplication.spec.ts`:
- Line 105: The regex stored in markerPattern only matches lowercase prefixes;
update the pattern used in the markerPattern constant to be case-insensitive and
include uppercase letters (e.g., allow A-Z) or add the i flag so markers like
"Chunk" or "TAB-A" are matched; modify the declaration of markerPattern to use a
pattern such as /\[[a-zA-Z0-9-]+-\d{4}\]/i (or add the /i flag to the existing
class) so tests detect markers regardless of casing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 55ba7202-2f70-4b70-bb6c-2b007f2620d0

📥 Commits

Reviewing files that changed from the base of the PR and between 122d49d and d511a22.

📒 Files selected for processing (5)
  • package.json
  • playwright.fidelity.config.ts
  • src/renderer/src/lib/ipc-mock.ts
  • src/renderer/src/lib/terminal-runtime-registry.ts
  • tests/playwright/fidelity-no-duplication.spec.ts

@kjgbot kjgbot force-pushed the claude/duplication-fidelity branch from d511a22 to bac3eaa Compare June 9, 2026 08:36
@codeant-ai

codeant-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI.

@kjgbot kjgbot merged commit 901d508 into main Jun 9, 2026
3 checks passed
@kjgbot kjgbot deleted the claude/duplication-fidelity branch June 9, 2026 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants