fix: reconnect immediately on foreground after long background#7297
fix: reconnect immediately on foreground after long background#7297diegolmello wants to merge 4 commits into
Conversation
Two stacked issues kept users at "Waiting for network" forever (or for many tens of seconds) after returning from a long background: 1. The foreground saga gated on meteor.connected — but that flag is exactly false when reconnect is needed. Drop the gate so checkAndReopen always runs on FOREGROUND when authenticated. 2. SDK checkAndReopen no-ops if Socket.connected returns true. The getter trusts readyState===1 && alive(), both of which can lie after iOS suspends the WebSocket: readyState often stays OPEN on a dead TCP, and lastPing can still be inside the alive window thanks to a server message right before suspend. Force-close the existing connection (with handlers detached so the orphan can't race a reopen()) and reset lastPing before calling open().
WalkthroughOn foreground, the app saga now gates only on ChangesSocket Reconnection Fix
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labelstype: bug 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 5/8 reviews remaining, refill in 16 minutes and 48 seconds.Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
package.json (1)
143-143: 💤 Low value
zustandis pinned two patch versions behind the current latest, missing relevant bug fixes.The exact pin to
5.0.10prevents picking up bug fixes in5.0.11(persist: avoid global localStorage, immer: proper typing) and5.0.12(persist: post-rehydration callback, devtools: config type extension). If this pin was intentional to work around a specific issue, a comment explaining it would help future maintainers. Otherwise, consider upgrading to^5.0.12or the latest patch.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@package.json` at line 143, The package.json pins the "zustand" dependency to 5.0.10 which misses recent bugfixes; update the "zustand" entry to a caret range such as "^5.0.12" (or the latest patch) to pick up fixes, or if the pin is intentional add a comment near the "zustand" entry explaining the reason and the specific issue being worked around; ensure any lockfile is regenerated (npm/yarn/pnpm install) after changing the "zustand" version so the new patch is installed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@package.json`:
- Line 143: The package.json pins the "zustand" dependency to 5.0.10 which
misses recent bugfixes; update the "zustand" entry to a caret range such as
"^5.0.12" (or the latest patch) to pick up fixes, or if the pin is intentional
add a comment near the "zustand" entry explaining the reason and the specific
issue being worked around; ensure any lockfile is regenerated (npm/yarn/pnpm
install) after changing the "zustand" version so the new patch is installed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 87b5cdaf-2921-4f88-a81c-ac933fa540ba
⛔ Files ignored due to path filters (1)
yarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (1)
package.json
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: ESLint and Test / run-eslint-and-test
- GitHub Check: format
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-05T13:55:00.974Z
Learnt from: Rohit3523
Repo: RocketChat/Rocket.Chat.ReactNative PR: 6930
File: package.json:101-101
Timestamp: 2026-02-05T13:55:00.974Z
Learning: In this repository, the dependency on react-native-image-crop-picker should reference the RocketChat fork (RocketChat/react-native-image-crop-picker) with explicit commit pins, not the upstream ivpusic/react-native-image-crop-picker. Update package.json dependencies (and any lockfile) to point to the fork URL and a specific commit, ensuring edge-to-edge Android fixes are included. This pattern should apply to all package.json files in the repo that declare this dependency.
Applied to files:
package.json
These slipped into the branch via a wt/yarn pin hook and were not part of the foreground-reconnect fix. Revert to base-branch state so the PR diff stays focused on the saga + sdk patch.
This reverts commit 7f5f358.
|
iOS Build Available Rocket.Chat Experimental 4.72.0.108751 |
|
Android Build Available Rocket.Chat Experimental 4.72.0.108750 Internal App Sharing: https://play.google.com/apps/test/RQVpXLytHNc/ahAO29uNQw5E-wtQbH40wKURLF3fCumEScfGoT2Dq6eJcAY1W5-ewPcIgSEYtJ18YMW0LcDd355ha3oEGB44xiyNIr |
Foreground transitions could leave the app stuck on "Waiting for network" because (a) the saga gate required `meteor.connected === true` to call `checkAndReopen()` — exactly the state where reconnect is needed, and (b) the SDK's `Socket.checkAndReopen` only reopened when `connected` was false, but a zombie socket (TCP dead, readyState still 1) would keep `connected` true and never reopen. Saga gate: drop `meteor.connected` from `appHasComeBackToForeground`; keep `login.isAuthenticated` and `appRoot === ROOT_INSIDE`. The background handler is unchanged. SDK patch (`@rocket.chat/sdk` ddp.ts): - New `Socket.probe()`: send raw ping + listen for pong with 2s deadline. Bypasses `send()` (which awaits pong indefinitely on zombies). - Rewrite `Socket.checkAndReopen` as a 3-bucket dispatcher keyed on `Date.now() - lastPing`: stale (> ping*2) → forceReopen, no probe; fresh (< 2s) → no-op; gray zone → probe, then forceReopen if dead. - New `Socket.forceReopen()`: zero `lastPing`, clear timeouts, emit `'disconnected'` so in-flight `send()` promises reject visibly, clear subscriptions, detach handlers on the dying connection, close with `userDisconnectCloseCode`, drop ref, `open()`. - Info-level log per invocation: bucket taken + elapsed ms. - Existing media-signal / media-calls subscription hunk preserved. Supersedes #7297 (always-teardown). Probe avoids the "Connecting" subtitle flicker on healthy foregrounds.
Foreground transitions could leave the app stuck on "Waiting for network" because (a) the saga gate required `meteor.connected === true` to call `checkAndReopen()` — exactly the state where reconnect is needed, and (b) the SDK's `Socket.checkAndReopen` only reopened when `connected` was false, but a zombie socket (TCP dead, readyState still 1) would keep `connected` true and never reopen. Saga gate: drop `meteor.connected` from `appHasComeBackToForeground`; keep `login.isAuthenticated` and `appRoot === ROOT_INSIDE`. The background handler is unchanged. SDK patch (`@rocket.chat/sdk` ddp.ts): - New `Socket.probe()`: send raw ping + listen for pong with 2s deadline. Bypasses `send()` (which awaits pong indefinitely on zombies). - Rewrite `Socket.checkAndReopen` as a 3-bucket dispatcher keyed on `Date.now() - lastPing`: stale (> ping*2) → forceReopen, no probe; fresh (< 2s) → no-op; gray zone → probe, then forceReopen if dead. - New `Socket.forceReopen()`: zero `lastPing`, clear timeouts, emit `'disconnected'` so in-flight `send()` promises reject visibly, clear subscriptions, detach handlers on the dying connection, close with `userDisconnectCloseCode`, drop ref, `open()`. - Info-level log per invocation: bucket taken + elapsed ms. - Existing media-signal / media-calls subscription hunk preserved. Supersedes #7297 (always-teardown). Probe avoids the "Connecting" subtitle flicker on healthy foregrounds.
Proposed changes
Fixes a class of reconnection bugs where users see "Waiting for network" forever (or for many tens of seconds) after the app returns from a long background — a common report on iOS especially.
Two stacked root causes:
1. Saga gate paradox (
app/sagas/state.js).appHasComeBackToForegroundbailed whenmeteor.connected === false— but that flag is exactlyfalseprecisely when reconnect is needed. The handler was silently skippingcheckAndReopen()in the very state that required it. Recovery fell entirely on the SDK's internalsetTimeout reopenloop, which iOS can starve while suspended.Fix: gate only on
login.isAuthenticated.checkAndReopenis now always invoked on FOREGROUND.2. Zombie socket fools
checkAndReopen(SDK patch@rocket.chat/sdklib/drivers/ddp.ts).Socket.checkAndReopenno-ops whenconnected === true. Theconnectedgetter isreadyState === 1 && alive(). After iOS suspend:readyStatecan stay1even though the underlying TCP is dead.alive()window islastPing + 40s; a server message just before suspend keepsalive()true on resume.Result: a dead-but-still-OPEN WebSocket.
checkAndReopensaw "connected" and did nothing. The nextsend({msg:'ping'})hit the zombie — thetry/catchonly logs, the awaited'pong'never fires, noreopen()ever scheduled. Forever waiting.Fix:
checkAndReopennow unconditionally tears down the existing connection (handlers detached first so the orphan can't race a reopen), zeroeslastPing, and callsopen(). Worst case it churns one extra socket on healthy resumes; best case it heals the stuck-forever cases.Issue(s)
Internal report: app sometimes never reconnects after long background; "Waiting for network" persists indefinitely.
How to test or reproduce
Repeat on iOS and Android. Try via PushKit/CallKit wake paths if available.
Screenshots
N/A — behavior change only.
Types of changes
Checklist
Further comments
Both fixes are minimal and tightly scoped. Additional follow-ups (not in this PR) that came out of the diagnosis and would further harden the recovery path:
open()handshake timeout in the SDK so a hung WebSocket connect on suspended iOS doesn't wait for OS-level TCP timeout.AppState changeeven whencurrentState === 'active'after a long gap (PushKit wake can flip AppState invisibly, locking out future foreground events).isConnected: false → truetocheckAndReopenfor network-blip recovery without depending on AppState.Targeted at
feat.voip-lib-new(PR #6918) per the source of these reports.Summary by CodeRabbit
Bug Fixes
New Features
Chores