test: terminal output fidelity regression net (no duplication under load)#176
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI. |
|
Warning Review limit reached
More reviews will be available in 51 minutes and 58 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR introduces a Playwright-based fidelity test suite that validates terminal buffer rendering correctness under high-rate PTY streaming. It adds test infrastructure ( ChangesTerminal fidelity testing infrastructure
Sequence DiagramsequenceDiagram
participant TestCase
participant bootWithAgents
participant startStream
participant PearMockHarness
participant expectFidelity
TestCase->>bootWithAgents: configure terminal layout, spawn agents
bootWithAgents-->>TestCase: agents running
TestCase->>startStream: inject marker PTY chunks for streams
startStream->>PearMockHarness: write chunks at specified indices
startStream-->>TestCase: stream injection complete
TestCase->>expectFidelity: verify marker counts and ordering
expectFidelity->>PearMockHarness: readMarkerStats (buffer + canonical)
PearMockHarness-->>expectFidelity: missing, duplicates, counts
expectFidelity-->>TestCase: assert no missing/duplicates, order valid
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/playwright/fidelity-no-duplication.spec.ts (1)
105-105: 💤 Low valueConsider case-insensitive marker pattern for robustness.
The regex pattern
/\[[a-z0-9-]+-\d{4}\]/gonly matches lowercase alphanumeric characters in marker prefixes. All current test markers use lowercase prefixes (chunk, tab-a, split-a, etc.), so this works today.However, if future tests use uppercase characters in marker prefixes, they won't be detected. Consider using
/\[[a-zA-Z0-9-]+-\d{4}\]/gifor forward compatibility.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/playwright/fidelity-no-duplication.spec.ts` at line 105, The regex stored in markerPattern only matches lowercase prefixes; update the pattern used in the markerPattern constant to be case-insensitive and include uppercase letters (e.g., allow A-Z) or add the i flag so markers like "Chunk" or "TAB-A" are matched; modify the declaration of markerPattern to use a pattern such as /\[[a-zA-Z0-9-]+-\d{4}\]/i (or add the /i flag to the existing class) so tests detect markers regardless of casing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/playwright/fidelity-no-duplication.spec.ts`:
- Line 105: The regex stored in markerPattern only matches lowercase prefixes;
update the pattern used in the markerPattern constant to be case-insensitive and
include uppercase letters (e.g., allow A-Z) or add the i flag so markers like
"Chunk" or "TAB-A" are matched; modify the declaration of markerPattern to use a
pattern such as /\[[a-zA-Z0-9-]+-\d{4}\]/i (or add the /i flag to the existing
class) so tests detect markers regardless of casing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 55ba7202-2f70-4b70-bb6c-2b007f2620d0
📒 Files selected for processing (5)
package.jsonplaywright.fidelity.config.tssrc/renderer/src/lib/ipc-mock.tssrc/renderer/src/lib/terminal-runtime-registry.tstests/playwright/fidelity-no-duplication.spec.ts
d511a22 to
bac3eaa
Compare
|
Your free trial PR review limit of 300 PRs has been reached. Please upgrade your plan to continue using CodeAnt AI. |
Summary
Adds a renderer fidelity regression net for the reported "when a lot of messages come in quickly it starts to repeat" issue.
window.__pearMock.getTerminalBufferText(projectId, name)so Playwright can inspect the live xterm buffer text through the mock harness.npm run test:fidelity, backed by a dedicated Playwright config on port 4175.Automated Fidelity Suite Results
npm run test:fidelitypassed: 6/6.split-0001andsplit-0002are asserted independently.The suite uses a finite 30s final-state drain deadline. Failure diagnostics include elapsed time, deadline, canonical count, rendered count, duplicate samples, and first missing marker samples.
Findings
Renderer-layer final-state duplication did not reproduce in the automated fidelity suite.
During calibration, high-rate synthetic bursts showed xterm drain lag but not duplication, loss, or reordering:
split-0001, contiguous tail missing, no duplicates.This PR does not claim to fix drain throughput. The drain-lag measurements are documented as follow-up perf characterization, not a fidelity failure.
Exploratory Coverage Matrix
dup-reproindependently ran a mock/web exploratory probe on currentmainwith zero duplicate IDs observed:injectPtyChunk\x1b[31m...\x1b[0mlinesrelay_inboundevents from agent-0001 to#general, unique event_id/seqHarness Boundary
Covered by mock/web harness:
Not covered by mock/web harness:
worker_stream -> broker:pty-chunkdelivery.dup-reproalso code-readsrc/main/broker.tsand found the real main dispatch path already has generation-gated listeners, reconnect resume fromlastEventSeq, andisDuplicatePtyChunk()dedupe byevent_id/id/seqwith content fallback.Follow-Up
If the user reproduces duplication again in the real app path, add low-noise production counters in
isDuplicatePtyChunk():suppressedByIdentitysuppressedByContentmissingIdentityCountSeparately, the xterm drain-lag measurements above are a candidate perf follow-up. A runtime join-write batching experiment may be worth evaluating later, but it is intentionally not part of this final-state fidelity PR.
Validation
npx tsc --noEmit -p tsconfig.web.jsonnpx vitest run(17 files, 236 tests)npm run buildnpm run test:fidelity(6 passed)npm run test:stress(2 passed; pty-heavy stress min FPS ~46-47, no console errors)