test: terminal output fidelity regression net (no duplication under load) by miyaontherelay · Pull Request #176 · AgentWorkforce/pear

miyaontherelay · 2026-06-08T20:34:04Z

Summary

Adds a renderer fidelity regression net for the reported "when a lot of messages come in quickly it starts to repeat" issue.

Adds window.__pearMock.getTerminalBufferText(projectId, name) so Playwright can inspect the live xterm buffer text through the mock harness.
Adds npm run test:fidelity, backed by a dedicated Playwright config on port 4175.
Adds six final-state PTY fidelity variants that assert canonical injected markers match rendered terminal markers exactly: count, no duplicates, strict order, first marker present, final marker present.

Automated Fidelity Suite Results

npm run test:fidelity passed: 6/6.

Variant	Result	What It Would Surface
High-rate stream	Pass	Direct renderer PTY duplication/loss/order issues under 1,000 unique chunks.
Tab-switch during stream	Pass	Snapshot replay race or stale tab listener replay when returning to the perturbed agent.
Split-pane transition during stream	Pass	Split/layout remount disturbance, second-pane attach missed chunks, or cross-pane duplicated tails. Both `split-0001` and `split-0002` are asserted independently.
Window resize during stream	Pass	Resize-triggered xterm/runtime replay or reordered PTY drain.
Window focus events during stream	Pass	Focus/visibility paths that could re-trigger DECSET/focus-redraw-style duplicated output.
Runtime remount mid-stream	Pass	Stale mount-token/runtime replay after graph/terminal remount.

The suite uses a finite 30s final-state drain deadline. Failure diagnostics include elapsed time, deadline, canonical count, rendered count, duplicate samples, and first missing marker samples.

Findings

Renderer-layer final-state duplication did not reproduce in the automated fidelity suite.

During calibration, high-rate synthetic bursts showed xterm drain lag but not duplication, loss, or reordering:

Baseline high-rate case: 1,000 canonical markers / 762 rendered markers after a 10s settle, contiguous tail missing, no duplicates.
Split-pane calibration run: 700 canonical markers / 604 rendered markers after a 15s bounded retry for split-0001, contiguous tail missing, no duplicates.
With the final-state drain deadline, these cases fully drain and pass exact equality.

This PR does not claim to fix drain throughput. The drain-lag measurements are documented as follow-up perf characterization, not a fidelity failure.

Exploratory Coverage Matrix

dup-repro independently ran a mock/web exploratory probe on current main with zero duplicate IDs observed:

Area	Variation	Parameters	Expected	Observed
PTY	Visible no-yield burst	10k unique lines, one `injectPtyChunk`	no duplicate line IDs in retained xterm scrollback	pass; 3044 retained IDs, dup=false
PTY	Tab switch away/back	1k lines, 20/batch, 1ms, switch agent-0002 then back	final 1k IDs, in order/no dup	pass; 1000 IDs, dup=false
PTY	Graph remount	1k lines while toggling graph on/off	final 1k IDs, no dup	pass
PTY	Split remount	1k lines while toggling split on/off	final 1k IDs, no dup	pass
PTY	Hidden background before first mount	1k lines to agent-0003 before selecting it	final 1k IDs after mount, no dup	pass
PTY	Background during graph remount	1k lines to agent-0004 while graph toggles	final 1k IDs after mount, no dup	pass
PTY	Multi-agent tab switching	3 agents x 1k lines, switch active tab midstream	sampled active agent final 1k IDs, no dup	pass
PTY	Focus/blur/click-out	1k lines plus terminal click, window blur/focus events, click out	final 1k IDs, no dup	pass
PTY	Viewport resize	1k lines during 2 viewport size changes	final 1k IDs, no dup	pass
PTY	Late agent spawn	1k lines while spawning 6 more agents	final 1k IDs, no dup	pass
PTY	Wide-ish payloads	1k lines with suffix text	final 1k IDs, no dup	pass
PTY	Rapid tab remount loop	2k lines, 10 away/back cycles	final 2k IDs, no dup	pass
PTY	Graph remount loop	2k lines, 5 graph on/off cycles	final 2k IDs, no dup	pass
PTY	Split remount loop	2k lines, 5 split on/off cycles	final 2k IDs, no dup	pass
PTY	Background mounted while live	2k lines to hidden agent, select it midstream	final 2k IDs, no dup	pass
PTY	Hidden no-yield burst before mount	10k unique lines to hidden agent-0004, then mount	no duplicate retained IDs	pass; 3020 retained IDs, dup=false
PTY	Long wrapped lines	1k lines with long wrapping suffix	final 1k IDs, no dup	pass
PTY	ANSI colored lines	1k synchronous `\x1b[31m...\x1b[0m` lines	final 1k IDs, no dup	pass
PTY	Resize churn	2k lines during 8 rapid viewport resizes	final 2k IDs, no dup	pass
PTY	4 split panes simultaneous	4 agents x 2k lines in split mode	each pane final 2k IDs, no dup	pass for all 4 panes
Chat	Single-agent inbound burst	1k tight `relay_inbound` events from agent-0001 to `#general`, unique event_id/seq	source IDs unique; visible chat DOM IDs unique and known	pass; sourceIds=1000, visibleIds=21, duplicateVisibleIds=0, unknownVisibleIds=0

Harness Boundary

Covered by mock/web harness:

Renderer PTY final fidelity after drain.
Remount/layout/focus/resize interactions.
Background-before-mount buffering.
Split-pane final fidelity.
Visible chat DOM identity under a 1k inbound burst.

Not covered by mock/web harness:

Real Electron main-process worker_stream -> broker:pty-chunk delivery.
Upstream SDK replay behavior.
Real PTY/TUI focus-redraw behavior.
Production broker events that lack stable identity.
Idle-dispose remount path in the lazy-mount registry. The runtime-remount variant exercises layout-driven unmount/remount via graph/tabs toggle; the 5-minute idle-dispose timer would need a test-only override and is deferred.

dup-repro also code-read src/main/broker.ts and found the real main dispatch path already has generation-gated listeners, reconnect resume from lastEventSeq, and isDuplicatePtyChunk() dedupe by event_id / id / seq with content fallback.

Follow-Up

If the user reproduces duplication again in the real app path, add low-noise production counters in isDuplicatePtyChunk():

suppressedByIdentity
suppressedByContent
missingIdentityCount

Separately, the xterm drain-lag measurements above are a candidate perf follow-up. A runtime join-write batching experiment may be worth evaluating later, but it is intentionally not part of this final-state fidelity PR.

Validation

npx tsc --noEmit -p tsconfig.web.json
npx vitest run (17 files, 236 tests)
npm run build
npm run test:fidelity (6 passed)
npm run test:stress (2 passed; pty-heavy stress min FPS ~46-47, no console errors)

gemini-code-assist · 2026-06-08T20:34:10Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

codeant-ai · 2026-06-08T20:34:11Z

Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI.

coderabbitai · 2026-06-08T20:34:23Z

Warning

Review limit reached

@kjgbot, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 51 minutes and 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c57f81d5-73b1-4475-bc87-7044db0762e0

📥 Commits

Reviewing files that changed from the base of the PR and between d511a22 and bac3eaa.

📒 Files selected for processing (5)

package.json
playwright.fidelity.config.ts
src/renderer/src/lib/ipc-mock.ts
src/renderer/src/lib/terminal-runtime-registry.ts
tests/playwright/fidelity-no-duplication.spec.ts

📝 Walkthrough

Walkthrough

This PR introduces a Playwright-based fidelity test suite that validates terminal buffer rendering correctness under high-rate PTY streaming. It adds test infrastructure (test:fidelity script, playwright.fidelity.config.ts), mock harness APIs to read live buffer state, and a comprehensive test suite with stress scenarios involving UI/runtime disruptions.

Changes

Terminal fidelity testing infrastructure

Layer / File(s)	Summary
Test script and Playwright configuration `package.json`, `playwright.fidelity.config.ts`	Registers `test:fidelity` npm script chaining `build:web` and Playwright execution; configures Playwright with timeouts, Vite preview server, Chrome desktop, trace-on-failure, and list reporter.
Mock harness terminal buffer API `src/renderer/src/lib/terminal-runtime-registry.ts`, `src/renderer/src/lib/ipc-mock.ts`	Exports `getTerminalRuntime(key)` to resolve active `TerminalRuntime` instances by agent key; adds `getTerminalBufferText(projectId, name)` method to `PearMockHarness` that converts live terminal buffer lines to newline-delimited text.
Fidelity test suite and helpers `tests/playwright/fidelity-no-duplication.spec.ts`	Implements test suite with helpers (`bootWithAgents`, `startStream`, `readMarkerStats`, `waitForFinalMarkerStats`, `expectFidelity`) that inject deterministic marker chunks, poll buffer state until expected counts match, and assert no missing or duplicate markers under tab switches, split-pane transitions, viewport resizes, focus/visibility events, and runtime remounts.

Sequence Diagram

sequenceDiagram
  participant TestCase
  participant bootWithAgents
  participant startStream
  participant PearMockHarness
  participant expectFidelity
  TestCase->>bootWithAgents: configure terminal layout, spawn agents
  bootWithAgents-->>TestCase: agents running
  TestCase->>startStream: inject marker PTY chunks for streams
  startStream->>PearMockHarness: write chunks at specified indices
  startStream-->>TestCase: stream injection complete
  TestCase->>expectFidelity: verify marker counts and ordering
  expectFidelity->>PearMockHarness: readMarkerStats (buffer + canonical)
  PearMockHarness-->>expectFidelity: missing, duplicates, counts
  expectFidelity-->>TestCase: assert no missing/duplicates, order valid

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Markers march through PTY streams,
Racing fast in fidelity dreams,
No duplicates hiding, no losses in sight,
The buffer renders crisp and right!
Through tabs and splits and resize spree,
The terminal stands true, wild and free! 🎪

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding a regression test suite (fidelity net) for terminal output duplication under high message load.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, detailing the fidelity test suite, mock harness additions, test results, and findings.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/duplication-fidelity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/playwright/fidelity-no-duplication.spec.ts (1)
105-105: 💤 Low value

Consider case-insensitive marker pattern for robustness.

The regex pattern /\[[a-z0-9-]+-\d{4}\]/g only matches lowercase alphanumeric characters in marker prefixes. All current test markers use lowercase prefixes (chunk, tab-a, split-a, etc.), so this works today.

However, if future tests use uppercase characters in marker prefixes, they won't be detected. Consider using /\[[a-zA-Z0-9-]+-\d{4}\]/gi for forward compatibility.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/playwright/fidelity-no-duplication.spec.ts` at line 105, The regex
stored in markerPattern only matches lowercase prefixes; update the pattern used
in the markerPattern constant to be case-insensitive and include uppercase
letters (e.g., allow A-Z) or add the i flag so markers like "Chunk" or "TAB-A"
are matched; modify the declaration of markerPattern to use a pattern such as
/\[[a-zA-Z0-9-]+-\d{4}\]/i (or add the /i flag to the existing class) so tests
detect markers regardless of casing.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/playwright/fidelity-no-duplication.spec.ts`:
- Line 105: The regex stored in markerPattern only matches lowercase prefixes;
update the pattern used in the markerPattern constant to be case-insensitive and
include uppercase letters (e.g., allow A-Z) or add the i flag so markers like
"Chunk" or "TAB-A" are matched; modify the declaration of markerPattern to use a
pattern such as /\[[a-zA-Z0-9-]+-\d{4}\]/i (or add the /i flag to the existing
class) so tests detect markers regardless of casing.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 55ba7202-2f70-4b70-bb6c-2b007f2620d0

📥 Commits

Reviewing files that changed from the base of the PR and between 122d49d and d511a22.

📒 Files selected for processing (5)

package.json
playwright.fidelity.config.ts
src/renderer/src/lib/ipc-mock.ts
src/renderer/src/lib/terminal-runtime-registry.ts
tests/playwright/fidelity-no-duplication.spec.ts

codeant-ai · 2026-06-09T08:36:28Z

Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI.

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

miyaontherelay added 2 commits June 9, 2026 10:35

test: add getTerminalBufferText to pear mock harness

09374f4

test: terminal fidelity regression for high-rate streaming variants

bac3eaa

kjgbot force-pushed the claude/duplication-fidelity branch from d511a22 to bac3eaa Compare June 9, 2026 08:36

kjgbot merged commit 901d508 into main Jun 9, 2026
3 checks passed

kjgbot deleted the claude/duplication-fidelity branch June 9, 2026 08:41

This was referenced Jun 10, 2026

fix(terminal): harden the rendering pipeline against dropped bytes, size drift, and flood lag #221

Merged

fix(terminal): skip already-focused refocus — stop Ink ?1004 focus-in frame stacking #342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: terminal output fidelity regression net (no duplication under load)#176

test: terminal output fidelity regression net (no duplication under load)#176
kjgbot merged 2 commits into
mainfrom
claude/duplication-fidelity

miyaontherelay commented Jun 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jun 8, 2026

Uh oh!

codeant-ai Bot commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codeant-ai Bot commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

miyaontherelay commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Automated Fidelity Suite Results

Findings

Exploratory Coverage Matrix

Harness Boundary

Follow-Up

Validation

Uh oh!

gemini-code-assist Bot commented Jun 8, 2026

Uh oh!

codeant-ai Bot commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codeant-ai Bot commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

miyaontherelay commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading