fix(app, mcp): PersistGate recovery + MCP handler drain on disconnect#857
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR introduces timeout enforcement for Core JSON-RPC requests via a new environment variable, implements graceful recovery when Redux store rehydration stalls, and enhances the MCP transport layer to proactively reject pending requests upon socket disconnection or replacement. Supporting test coverage and configuration updates are included. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 34 minutes and 23 seconds.Comment |
|
there are some mergre conflcits from the Previous PR that you had raised where the .claude rules files were removed. jfyi |
callCoreRpc() had no timeout on its fetch(), so a hung core sidecar would block every caller (and the UI) indefinitely. Add a CORE_RPC_TIMEOUT_MS bound (default 30s, configurable via VITE_CORE_RPC_TIMEOUT_MS, clamped to [1s, 10min]) via AbortController, and surface the timeout as a distinct error message the UI can match. Also lands the adversarial-review plan doc at docs/reviews/2026-04-23-adversarial-fixes.md so the remaining follow-up fixes are tracked. https://claude.ai/code/session_01T8pcNi5wDUQuxdDFe41a8i
…them
When `invoke('core_rpc_url')` threw or returned empty, the client
silently fell back to the build-time CORE_RPC_URL default. That hid
port-mismatch bugs where the UI connected to a stale default while the
real core ran elsewhere — every subsequent call failed with an obscure
ECONNREFUSED and no trail back to the misconfig.
Log the underlying error (via coreRpcError) on both the throw path and
the empty-string path so the fallback is visible. Behavior otherwise
unchanged.
https://claude.ai/code/session_01T8pcNi5wDUQuxdDFe41a8i
When the MCP socket dropped while a request was in flight, the response would never arrive and the caller's promise would hang until the 30s per-request timeout fired. Under a reconnect storm these stacked up. Register a `disconnect` listener that rejects every pending handler with a synthetic JSON-RPC error so callers see `Socket disconnected: <reason>` immediately. Apply the same treatment inside `updateSocket()` — requests emitted on the old socket can't possibly complete on the new one, so rejecting them with `Socket replaced` is the honest signal. Adds a transport unit-test suite covering the happy path, single- and multi-request disconnect, socket replacement, and late-arriving response-after-disconnect (noop). https://claude.ai/code/session_01T8pcNi5wDUQuxdDFe41a8i
PersistGate has no deadline — if rehydration stalls (corrupt localStorage, slow/failing storage adapter, antivirus scan on the persist file) the user sees a permanent splash screen with no way out. Replace the plain RouteLoadingScreen `loading` prop with a wrapper that keeps the same spinner for the first 10s and then swaps in a recovery panel offering `persistor.purge()` + window reload. Rehydration that eventually succeeds still tears the component down as before, so the happy path is unchanged. Also drops an unused `vi` import in the new MCP transport test picked up by tsc. https://claude.ai/code/session_01T8pcNi5wDUQuxdDFe41a8i
…ream tinyhumansai#805) Files in .claude/rules/ auto-load into every Claude Code session in this repo, so anything inaccurate actively poisons the context rather than just sitting in a docs folder. Seven rules had drifted hard enough that they now misrepresent the project on boot: - 00 & 01: "crypto community platform / poke.com inspiration" with mobile targets — the actual product is an AI assistant for communities, desktop-only (compile_error! rejects non-desktop in the Tauri shell). - 05 & 06: Android / iOS platform setup guides — not shipped platforms. - 08: frontend guide listed Telegram MTProto, a crypto price ticker, and chat with crypto addresses. None of those exist in app/src. - 11: declared Zustand for state management; the repo uses Redux Toolkit + redux-persist. Also used wrong `src-tauri/` paths. - 16: "Outsourced - Crypto Community Platform" with a system-tray / autostart setup whose tauri.conf.json config is not present today. Rewrote 02 (development commands) to match the authoritative command list in CLAUDE.md — desktop-only, correct workspace paths, real script names (yarn compile, not yarn typecheck). Dropped Android/iOS sections from 10 (troubleshooting) and corrected `src-tauri/` path references to `app/src-tauri/`. Remaining rules (03, 04, 07, 09, 12, 13, 14, 15, 17) are either scoped to the right paths via frontmatter or general-purpose enough to stay. Also marks items 5, 6, 7, 8, 9 of the adversarial-review plan complete. https://claude.ai/code/session_01T8pcNi5wDUQuxdDFe41a8i
Trailing-comma and line-width tweaks from `yarn format` on the three files touched by this branch's fix commits. No behavior change. https://claude.ai/code/session_01T8pcNi5wDUQuxdDFe41a8i
f40858d to
4d433da
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
app/src/lib/mcp/transport.ts (1)
130-141: ConfirmdisconnectHandleris the same listener identity used at registration.
socket.off('disconnect', this.disconnectHandler)only removes the listener if it was registered with the same function reference. BecausedisconnectHandleris a class-field arrow function it's bound per-instance, sosetupEventHandlersandupdateSocketsee the same reference — this is correct as written. Just flagging because if someone later refactorsdisconnectHandlerto a method (disconnectHandler() { ... }) or wraps it (e.g..bind(this)at registration time), the reference will diverge andoff()will silently leak listeners. A short comment near the field declaration calling out "must remain an arrow-function class field" would harden this against drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/lib/mcp/transport.ts` around lines 130 - 141, Add a short comment next to the disconnectHandler class-field declaration noting that it must remain an arrow-function class field so its identity stays stable for socket.off; reference the disconnectHandler, setupEventHandlers, and updateSocket code paths that rely on the same function reference when calling socket.off('disconnect', this.disconnectHandler) to avoid silent listener leaks if someone later converts it to a method or uses .bind(this).app/src/lib/mcp/__tests__/transport.test.ts (1)
50-107: Consider adding one negative test for the "request while disconnected" precondition.The suite covers in-flight rejection on disconnect well, but there's no assertion for the synchronous
throw new Error('Socket not connected')branch inrequest()(transport.ts line 99). A one-liner test that flipssocket.connected = falsebefore callingtransport.request(...)and asserts the rejection would lock in that precondition alongside the new draining behavior. Strictly optional — the disconnect-during-flight path is the higher-value coverage and you already have it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/lib/mcp/__tests__/transport.test.ts` around lines 50 - 107, Add a unit test that asserts transport.request rejects/throws immediately when the socket is already disconnected: set socket.connected = false, then call transport.request(baseRequest(...)) and assert the synchronous error (e.g., expect(() => transport.request(baseRequest(123))).toThrow(/Socket not connected/)). Reference the existing FakeSocket, transport.request, and baseRequest helpers and name the test like "throws when requesting while disconnected".app/src/components/PersistRehydrationScreen.tsx (1)
26-52: Add a unit test for the 10s timeout → recovery CTA path.The PR test plan explicitly lists "PersistGate test covering the 10s timeout path" as unchecked, and there's no
PersistRehydrationScreen.test.tsxco-located here. Given the whole point of the component is a deadline that fires when rehydration stalls, a fake-timer test asserting the swap fromRouteLoadingScreen→ recovery panel + that the button callspersistor.purge()thenwindow.location.reload()would lock in the regression-class this PR was opened against. As per coding guidelines, place co-located unit tests as*.test.ts/*.test.tsxfiles underapp/src/**alongside the code they test.Want me to draft
app/src/components/PersistRehydrationScreen.test.tsxcovering: (a) initial render showsRouteLoadingScreen, (b) after advancing fake timers past 10s the recovery panel appears, (c) clicking "Reset local state" awaitspersistor.purge()and then triggers reload, (d)purge()rejection is swallowed and reload still happens?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/PersistRehydrationScreen.tsx` around lines 26 - 52, Add a co-located unit test file PersistRehydrationScreen.test.tsx that uses fake timers to assert the 10s timeout → recovery CTA path: render PersistRehydrationScreen and verify it initially shows RouteLoadingScreen, advance timers past REHYDRATION_WARN_TIMEOUT_MS and assert the recovery panel/button appears, mock persistor.purge to resolve and verify clicking the "Reset local state" button calls persistor.purge and then window.location.reload, and also include a test where persistor.purge rejects to ensure the error is swallowed (persistWarn) and reload still occurs; use jest.spyOn or jest.fn to stub persistor.purge and spy on window.location.reload and restore timers/mocks after each test.app/src/services/coreRpcClient.ts (2)
116-117: Nit: redundant fallback on Line 117.
resolvedCoreRpcUrlis assignedtrimmed || CORE_RPC_URLon Line 116, so it's guaranteed non-empty (assumingCORE_RPC_URLitself is non-empty). The trailing|| CORE_RPC_URLon Line 117 is dead.🧹 Proposed cleanup
resolvedCoreRpcUrl = trimmed || CORE_RPC_URL; - return resolvedCoreRpcUrl || CORE_RPC_URL; + return resolvedCoreRpcUrl;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/services/coreRpcClient.ts` around lines 116 - 117, The assignment sets resolvedCoreRpcUrl = trimmed || CORE_RPC_URL, so the subsequent return resolvedCoreRpcUrl || CORE_RPC_URL is redundant; simplify by returning resolvedCoreRpcUrl directly (or return trimmed || CORE_RPC_URL inline) and remove the trailing || CORE_RPC_URL fallback to avoid dead code—update the code using the resolvedCoreRpcUrl and trimmed/CORE_RPC_URL symbols accordingly.
170-191: Timeout pattern looks correct; body reads remain unbounded.The
AbortController+setTimeout+clearTimeoutinfinallypattern is correct, and distinguishing the abort case viacontroller.signal.abortedbefore re-throwing is the right way to surface a timeout-specific error.One small caveat to be aware of: the timer is cleared as soon as
fetchresolves the headers, soawait response.text()(Line 194) andawait response.json()(Line 198) are no longer bounded byCORE_RPC_TIMEOUT_MS. A core that flushes headers but then hangs mid-body would still stall the caller indefinitely. If that's the intended scope (bound only to "fetch returns response object"), the inline comment at Line 170-173 already documents it; otherwise consider abort-scoping the entire response read as well, e.g.:♻️ Optional: extend bound to cover body read
- const controller = new AbortController(); - const timeoutId = setTimeout(() => controller.abort(), CORE_RPC_TIMEOUT_MS); - let response: Response; - try { - response = await fetch(rpcUrl, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify(payload), - signal: controller.signal, - }); - } catch (fetchErr) { - if (controller.signal.aborted) { - throw new Error(`Core RPC ${payload.method} timed out after ${CORE_RPC_TIMEOUT_MS}ms`); - } - throw fetchErr; - } finally { - clearTimeout(timeoutId); - } + const controller = new AbortController(); + const timeoutId = setTimeout(() => controller.abort(), CORE_RPC_TIMEOUT_MS); + let response: Response; + let json: JsonRpcResponse<T>; + try { + response = await fetch(rpcUrl, { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify(payload), + signal: controller.signal, + }); + + if (!response.ok) { + const text = await response.text(); + throw new Error(`Core RPC HTTP ${response.status}: ${text || response.statusText}`); + } + + json = (await response.json()) as JsonRpcResponse<T>; + } catch (fetchErr) { + if (controller.signal.aborted) { + throw new Error(`Core RPC ${payload.method} timed out after ${CORE_RPC_TIMEOUT_MS}ms`); + } + throw fetchErr; + } finally { + clearTimeout(timeoutId); + }Note: the
if (!response.ok)and JSON parsing blocks at Lines 193-210 would need to fold into thistry(or the success path) accordingly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/services/coreRpcClient.ts` around lines 170 - 191, The timeout currently only covers the fetch call because clearTimeout(timeoutId) runs as soon as headers arrive; to bound the body read as well, keep the AbortController+timeout active until after you finish reading the response body and parsing JSON/text: delay calling clearTimeout(timeoutId) (and therefore avoid clearing the controller.signal) until after the await response.text()/await response.json() work, and move the `if (!response.ok)` / parsing logic into the same success path (try block) so those awaits are still covered by controller.signal and will abort with the same timeout error; ensure you still clear the timeout in the final cleanup after body reads to avoid leaks and keep using CORE_RPC_TIMEOUT_MS and controller.signal everywhere the response is consumed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/reviews/2026-04-23-adversarial-fixes.md`:
- Around line 47-54: Update the ".claude/rules" removal bullet so it accurately
reflects what this branch changed versus what was already merged upstream:
remove or reword the claim "deleted 7 files" if those deletions were already
applied upstream, and instead state that this branch contains net-new edits
limited to the rewrite of "02-development-commands.md" and the trimmed
Android/iOS sections in "10-troubleshooting.md"; explicitly call out any files
truly deleted by this branch (e.g., list filenames only if this PR removed them)
and note that other deletions (like "00-project-vision.md",
"01-project-overview.md", "05-platform-setup-android.md",
"06-platform-setup-ios.md", "08-frontend-guide.md", "11-tech-stack-detailed.md",
"16-macos-background-execution.md") originated upstream to avoid future
blame/bisect confusion.
---
Nitpick comments:
In `@app/src/components/PersistRehydrationScreen.tsx`:
- Around line 26-52: Add a co-located unit test file
PersistRehydrationScreen.test.tsx that uses fake timers to assert the 10s
timeout → recovery CTA path: render PersistRehydrationScreen and verify it
initially shows RouteLoadingScreen, advance timers past
REHYDRATION_WARN_TIMEOUT_MS and assert the recovery panel/button appears, mock
persistor.purge to resolve and verify clicking the "Reset local state" button
calls persistor.purge and then window.location.reload, and also include a test
where persistor.purge rejects to ensure the error is swallowed (persistWarn) and
reload still occurs; use jest.spyOn or jest.fn to stub persistor.purge and spy
on window.location.reload and restore timers/mocks after each test.
In `@app/src/lib/mcp/__tests__/transport.test.ts`:
- Around line 50-107: Add a unit test that asserts transport.request
rejects/throws immediately when the socket is already disconnected: set
socket.connected = false, then call transport.request(baseRequest(...)) and
assert the synchronous error (e.g., expect(() =>
transport.request(baseRequest(123))).toThrow(/Socket not connected/)). Reference
the existing FakeSocket, transport.request, and baseRequest helpers and name the
test like "throws when requesting while disconnected".
In `@app/src/lib/mcp/transport.ts`:
- Around line 130-141: Add a short comment next to the disconnectHandler
class-field declaration noting that it must remain an arrow-function class field
so its identity stays stable for socket.off; reference the disconnectHandler,
setupEventHandlers, and updateSocket code paths that rely on the same function
reference when calling socket.off('disconnect', this.disconnectHandler) to avoid
silent listener leaks if someone later converts it to a method or uses
.bind(this).
In `@app/src/services/coreRpcClient.ts`:
- Around line 116-117: The assignment sets resolvedCoreRpcUrl = trimmed ||
CORE_RPC_URL, so the subsequent return resolvedCoreRpcUrl || CORE_RPC_URL is
redundant; simplify by returning resolvedCoreRpcUrl directly (or return trimmed
|| CORE_RPC_URL inline) and remove the trailing || CORE_RPC_URL fallback to
avoid dead code—update the code using the resolvedCoreRpcUrl and
trimmed/CORE_RPC_URL symbols accordingly.
- Around line 170-191: The timeout currently only covers the fetch call because
clearTimeout(timeoutId) runs as soon as headers arrive; to bound the body read
as well, keep the AbortController+timeout active until after you finish reading
the response body and parsing JSON/text: delay calling clearTimeout(timeoutId)
(and therefore avoid clearing the controller.signal) until after the await
response.text()/await response.json() work, and move the `if (!response.ok)` /
parsing logic into the same success path (try block) so those awaits are still
covered by controller.signal and will abort with the same timeout error; ensure
you still clear the timeout in the final cleanup after body reads to avoid leaks
and keep using CORE_RPC_TIMEOUT_MS and controller.signal everywhere the response
is consumed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a5784c0c-d94f-49e5-9448-262cf083d4ec
📒 Files selected for processing (10)
app/.env.exampleapp/src/App.tsxapp/src/components/PersistRehydrationScreen.tsxapp/src/lib/mcp/__tests__/transport.test.tsapp/src/lib/mcp/transport.tsapp/src/services/__tests__/coreRpcClient.test.tsapp/src/services/coreRpcClient.tsapp/src/test/setup.tsapp/src/utils/config.tsdocs/reviews/2026-04-23-adversarial-fixes.md
| 9. **[x] Stale `.claude/rules/` pass (upstream #805)** — deleted 7 files that actively contradicted CLAUDE.md (they auto-load into every Claude Code session, so their misinformation is load-bearing context, not just docs): | ||
| - `00-project-vision.md`, `01-project-overview.md` — "crypto community platform, poke.com" framing, mobile targets. | ||
| - `05-platform-setup-android.md`, `06-platform-setup-ios.md` — non-desktop targets are rejected by `compile_error!`. | ||
| - `08-frontend-guide.md` — fabricated features (Telegram MTProto, crypto price ticker, chat with crypto addresses). | ||
| - `11-tech-stack-detailed.md` — claimed Zustand for state mgmt (repo uses Redux Toolkit); wrong directory paths. | ||
| - `16-macos-background-execution.md` — "Outsourced" product name, tray config that isn't in `tauri.conf.json`. | ||
|
|
||
| Rewrote `02-development-commands.md` to match CLAUDE.md (desktop-only, workspace paths, real script names). Trimmed Android/iOS sections out of `10-troubleshooting.md`, fixed `src-tauri/` paths. |
There was a problem hiding this comment.
Reconcile the .claude/rules removal entry with the merge-conflict feedback.
The reviewer comment on this PR notes that the .claude/rules files were already removed in an earlier PR, which is producing merge conflicts here. Once that's resolved, the "deleted 7 files" bullet in this doc will either be (a) inaccurate for this branch (the deletions came from upstream, not this PR) or (b) a partial overlap (only the rewrite of 02-development-commands.md and the trim of 10-troubleshooting.md are net-new in this PR). Worth rewording so the doc tracks what this branch actually changed vs. what already landed upstream — otherwise future bisect/blame will be misleading.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/reviews/2026-04-23-adversarial-fixes.md` around lines 47 - 54, Update
the ".claude/rules" removal bullet so it accurately reflects what this branch
changed versus what was already merged upstream: remove or reword the claim
"deleted 7 files" if those deletions were already applied upstream, and instead
state that this branch contains net-new edits limited to the rewrite of
"02-development-commands.md" and the trimmed Android/iOS sections in
"10-troubleshooting.md"; explicitly call out any files truly deleted by this
branch (e.g., list filenames only if this PR removed them) and note that other
deletions (like "00-project-vision.md", "01-project-overview.md",
"05-platform-setup-android.md", "06-platform-setup-ios.md",
"08-frontend-guide.md", "11-tech-stack-detailed.md",
"16-macos-background-execution.md") originated upstream to avoid future
blame/bisect confusion.
# Conflicts: # app/src/services/coreRpcClient.ts
Bug fixes surfaced during an issue-triage pass.
Fixes
.claude/rulestrim: drop rules that contradict CLAUDE.md (upstream [Feature] Audit and trim .claude/rules/ — most content is stale/duplicated/out-of-scope #805)Net: +458 / −1,546.
Test plan
yarn typecheck+yarn lintgreenSummary by CodeRabbit
New Features
Bug Fixes