Skip to content

fix: reconnect immediately on foreground after long background#7297

Closed
diegolmello wants to merge 4 commits into
feat.voip-lib-newfrom
fix.foreground-reconnect
Closed

fix: reconnect immediately on foreground after long background#7297
diegolmello wants to merge 4 commits into
feat.voip-lib-newfrom
fix.foreground-reconnect

Conversation

@diegolmello

@diegolmello diegolmello commented May 4, 2026

Copy link
Copy Markdown
Member

Proposed changes

Fixes a class of reconnection bugs where users see "Waiting for network" forever (or for many tens of seconds) after the app returns from a long background — a common report on iOS especially.

Two stacked root causes:

1. Saga gate paradox (app/sagas/state.js). appHasComeBackToForeground bailed when meteor.connected === false — but that flag is exactly false precisely when reconnect is needed. The handler was silently skipping checkAndReopen() in the very state that required it. Recovery fell entirely on the SDK's internal setTimeout reopen loop, which iOS can starve while suspended.

Fix: gate only on login.isAuthenticated. checkAndReopen is now always invoked on FOREGROUND.

2. Zombie socket fools checkAndReopen (SDK patch @rocket.chat/sdk lib/drivers/ddp.ts). Socket.checkAndReopen no-ops when connected === true. The connected getter is readyState === 1 && alive(). After iOS suspend:

  • readyState can stay 1 even though the underlying TCP is dead.
  • alive() window is lastPing + 40s; a server message just before suspend keeps alive() true on resume.

Result: a dead-but-still-OPEN WebSocket. checkAndReopen saw "connected" and did nothing. The next send({msg:'ping'}) hit the zombie — the try/catch only logs, the awaited 'pong' never fires, no reopen() ever scheduled. Forever waiting.

Fix: checkAndReopen now unconditionally tears down the existing connection (handlers detached first so the orphan can't race a reopen), zeroes lastPing, and calls open(). Worst case it churns one extra socket on healthy resumes; best case it heals the stuck-forever cases.

Issue(s)

Internal report: app sometimes never reconnects after long background; "Waiting for network" persists indefinitely.

How to test or reproduce

  1. Open the app, log in, confirm connected (no "Waiting for network").
  2. Send the app to background (Home button / lock screen).
  3. Wait 10+ minutes (longer if on Wi-Fi without idle disconnect; shorter on flaky cellular).
  4. Bring the app to foreground.
  5. Before the fix: subtitle frequently stays at "Waiting for network" for tens of seconds or never recovers; messages do not arrive.
  6. After the fix: socket tears down + reopens within ~1s of foreground; subtitle returns to normal; presence syncs.

Repeat on iOS and Android. Try via PushKit/CallKit wake paths if available.

Screenshots

N/A — behavior change only.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves a current function)
  • New feature (non-breaking change which adds functionality)
  • Documentation update (if none of the other choices apply)

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works (if applicable)
  • I have added necessary documentation (if applicable)
  • Any dependent changes have been merged and published in downstream modules

Further comments

Both fixes are minimal and tightly scoped. Additional follow-ups (not in this PR) that came out of the diagnosis and would further harden the recovery path:

  • Add an open() handshake timeout in the SDK so a hung WebSocket connect on suspended iOS doesn't wait for OS-level TCP timeout.
  • Re-fire FOREGROUND on AppState change even when currentState === 'active' after a long gap (PushKit wake can flip AppState invisibly, locking out future foreground events).
  • Bridge NetInfo isConnected: false → true to checkAndReopen for network-blip recovery without depending on AppState.
  • Foreground heartbeat saga as a safety net.

Targeted at feat.voip-lib-new (PR #6918) per the source of these reports.

Summary by CodeRabbit

  • Bug Fixes

    • Improved foreground reconnection so the app reliably restores sessions and reconnects sockets after returning from background.
  • New Features

    • Added handling for media signal and media call events to enhance real-time media behavior.
  • Chores

    • Pinned a dependency to a specific version to ensure consistent installs.

Two stacked issues kept users at "Waiting for network" forever (or for
many tens of seconds) after returning from a long background:

1. The foreground saga gated on meteor.connected — but that flag is
   exactly false when reconnect is needed. Drop the gate so
   checkAndReopen always runs on FOREGROUND when authenticated.

2. SDK checkAndReopen no-ops if Socket.connected returns true. The
   getter trusts readyState===1 && alive(), both of which can lie after
   iOS suspends the WebSocket: readyState often stays OPEN on a dead
   TCP, and lastPing can still be inside the alive window thanks to a
   server message right before suspend. Force-close the existing
   connection (with handlers detached so the orphan can't race a
   reopen()) and reset lastPing before calling open().
@coderabbitai

coderabbitai Bot commented May 4, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

On foreground, the app saga now gates only on login.isAuthenticated and always calls checkAndReopen(). DDPDriver.checkAndReopen was changed to unconditionally reset reconnection state (clear timers, close/teardown any existing connection) and call open(). Subscribed event topics add media-signal and media-calls. zustand was pinned to 5.0.10.

Changes

Socket Reconnection Fix

Layer / File(s) Summary
Core Driver Reset
patches/@rocket.chat+sdk+1.3.3-mobile.patch
DDPDriver.checkAndReopen now unconditionally sets lastPing = 0, clears openTimeout/pingTimeout, detaches handlers from and closes any existing connection, clears this.connection, then calls open() (removed conditional reopen-only-if-disconnected flow).
Event Subscription Expansion
patches/@rocket.chat+sdk+1.3.3-mobile.patch
Subscribed event topics list extended to include media-signal and media-calls.
Foreground Saga Wiring
app/sagas/state.js
appHasComeBackToForeground removed the prior combined auth+socket readiness guard and now checks only login.isAuthenticated before calling localAuthenticate(server) and unconditionally checkAndReopen(), then proceeds to pending-notification checks and setUserPresenceOnline().
Dependency Pinning
package.json
dependencies.zustand changed from ^5.0.10 to 5.0.10 (fixed version).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

type: bug

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main fix: ensuring reconnection happens immediately when the app returns to the foreground after being backgrounded, which is the primary objective across both changed files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 5/8 reviews remaining, refill in 16 minutes and 48 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
package.json (1)

143-143: 💤 Low value

zustand is pinned two patch versions behind the current latest, missing relevant bug fixes.

The exact pin to 5.0.10 prevents picking up bug fixes in 5.0.11 (persist: avoid global localStorage, immer: proper typing) and 5.0.12 (persist: post-rehydration callback, devtools: config type extension). If this pin was intentional to work around a specific issue, a comment explaining it would help future maintainers. Otherwise, consider upgrading to ^5.0.12 or the latest patch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@package.json` at line 143, The package.json pins the "zustand" dependency to
5.0.10 which misses recent bugfixes; update the "zustand" entry to a caret range
such as "^5.0.12" (or the latest patch) to pick up fixes, or if the pin is
intentional add a comment near the "zustand" entry explaining the reason and the
specific issue being worked around; ensure any lockfile is regenerated
(npm/yarn/pnpm install) after changing the "zustand" version so the new patch is
installed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@package.json`:
- Line 143: The package.json pins the "zustand" dependency to 5.0.10 which
misses recent bugfixes; update the "zustand" entry to a caret range such as
"^5.0.12" (or the latest patch) to pick up fixes, or if the pin is intentional
add a comment near the "zustand" entry explaining the reason and the specific
issue being worked around; ensure any lockfile is regenerated (npm/yarn/pnpm
install) after changing the "zustand" version so the new patch is installed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 87b5cdaf-2921-4f88-a81c-ac933fa540ba

📥 Commits

Reviewing files that changed from the base of the PR and between d9f69df and 2a2ebb4.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (1)
  • package.json
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: ESLint and Test / run-eslint-and-test
  • GitHub Check: format
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-05T13:55:00.974Z
Learnt from: Rohit3523
Repo: RocketChat/Rocket.Chat.ReactNative PR: 6930
File: package.json:101-101
Timestamp: 2026-02-05T13:55:00.974Z
Learning: In this repository, the dependency on react-native-image-crop-picker should reference the RocketChat fork (RocketChat/react-native-image-crop-picker) with explicit commit pins, not the upstream ivpusic/react-native-image-crop-picker. Update package.json dependencies (and any lockfile) to point to the fork URL and a specific commit, ensuring edge-to-edge Android fixes are included. This pattern should apply to all package.json files in the repo that declare this dependency.

Applied to files:

  • package.json

These slipped into the branch via a wt/yarn pin hook and were not part of the
foreground-reconnect fix. Revert to base-branch state so the PR diff stays
focused on the saga + sdk patch.
@diegolmello diegolmello had a problem deploying to approve_e2e_testing May 4, 2026 19:06 — with GitHub Actions Failure
@diegolmello diegolmello had a problem deploying to official_android_build May 4, 2026 19:09 — with GitHub Actions Failure
@diegolmello diegolmello temporarily deployed to experimental_android_build May 4, 2026 19:09 — with GitHub Actions Inactive
@diegolmello diegolmello temporarily deployed to experimental_ios_build May 4, 2026 19:09 — with GitHub Actions Inactive
@diegolmello diegolmello had a problem deploying to official_ios_build May 4, 2026 19:09 — with GitHub Actions Failure
@github-actions

github-actions Bot commented May 4, 2026

Copy link
Copy Markdown

iOS Build Available

Rocket.Chat Experimental 4.72.0.108751

@diegolmello diegolmello had a problem deploying to upload_experimental_android May 4, 2026 19:48 — with GitHub Actions Failure
@github-actions

github-actions Bot commented May 4, 2026

Copy link
Copy Markdown

Android Build Available

Rocket.Chat Experimental 4.72.0.108750

Internal App Sharing: https://play.google.com/apps/test/RQVpXLytHNc/ahAO29uNQw5E-wtQbH40wKURLF3fCumEScfGoT2Dq6eJcAY1W5-ewPcIgSEYtJ18YMW0LcDd355ha3oEGB44xiyNIr

@diegolmello diegolmello closed this May 5, 2026
diegolmello added a commit that referenced this pull request May 5, 2026
Foreground transitions could leave the app stuck on "Waiting for network"
because (a) the saga gate required `meteor.connected === true` to call
`checkAndReopen()` — exactly the state where reconnect is needed, and (b)
the SDK's `Socket.checkAndReopen` only reopened when `connected` was
false, but a zombie socket (TCP dead, readyState still 1) would keep
`connected` true and never reopen.

Saga gate: drop `meteor.connected` from `appHasComeBackToForeground`;
keep `login.isAuthenticated` and `appRoot === ROOT_INSIDE`. The
background handler is unchanged.

SDK patch (`@rocket.chat/sdk` ddp.ts):
- New `Socket.probe()`: send raw ping + listen for pong with 2s deadline.
  Bypasses `send()` (which awaits pong indefinitely on zombies).
- Rewrite `Socket.checkAndReopen` as a 3-bucket dispatcher keyed on
  `Date.now() - lastPing`: stale (> ping*2) → forceReopen, no probe;
  fresh (< 2s) → no-op; gray zone → probe, then forceReopen if dead.
- New `Socket.forceReopen()`: zero `lastPing`, clear timeouts, emit
  `'disconnected'` so in-flight `send()` promises reject visibly,
  clear subscriptions, detach handlers on the dying connection,
  close with `userDisconnectCloseCode`, drop ref, `open()`.
- Info-level log per invocation: bucket taken + elapsed ms.
- Existing media-signal / media-calls subscription hunk preserved.

Supersedes #7297 (always-teardown). Probe avoids the "Connecting"
subtitle flicker on healthy foregrounds.
diegolmello added a commit that referenced this pull request May 6, 2026
Foreground transitions could leave the app stuck on "Waiting for network"
because (a) the saga gate required `meteor.connected === true` to call
`checkAndReopen()` — exactly the state where reconnect is needed, and (b)
the SDK's `Socket.checkAndReopen` only reopened when `connected` was
false, but a zombie socket (TCP dead, readyState still 1) would keep
`connected` true and never reopen.

Saga gate: drop `meteor.connected` from `appHasComeBackToForeground`;
keep `login.isAuthenticated` and `appRoot === ROOT_INSIDE`. The
background handler is unchanged.

SDK patch (`@rocket.chat/sdk` ddp.ts):
- New `Socket.probe()`: send raw ping + listen for pong with 2s deadline.
  Bypasses `send()` (which awaits pong indefinitely on zombies).
- Rewrite `Socket.checkAndReopen` as a 3-bucket dispatcher keyed on
  `Date.now() - lastPing`: stale (> ping*2) → forceReopen, no probe;
  fresh (< 2s) → no-op; gray zone → probe, then forceReopen if dead.
- New `Socket.forceReopen()`: zero `lastPing`, clear timeouts, emit
  `'disconnected'` so in-flight `send()` promises reject visibly,
  clear subscriptions, detach handlers on the dying connection,
  close with `userDisconnectCloseCode`, drop ref, `open()`.
- Info-level log per invocation: bucket taken + elapsed ms.
- Existing media-signal / media-calls subscription hunk preserved.

Supersedes #7297 (always-teardown). Probe avoids the "Connecting"
subtitle flicker on healthy foregrounds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant