Skip to content

fix(sentry): make exit telemetry lossless and stop cross-process clobbering#84

Merged
gmaclennan merged 1 commit into
mainfrom
fix/exit-telemetry-review
Jun 10, 2026
Merged

fix(sentry): make exit telemetry lossless and stop cross-process clobbering#84
gmaclennan merged 1 commit into
mainfrom
fix/exit-telemetry-review

Conversation

@gmaclennan

Copy link
Copy Markdown
Member

Post-merge review fixes for the Phase 6/7a exit-reason telemetry (#72). A code review after the merge found a cluster of correctness bugs in the collection pipeline — three of which undermined the headline metrics the feature exists to produce.

Android

Records were silently consumed without being reported. Three paths shared one root cause: collect() advanced the last_seen high-water mark before capture, and capture silently no-ops on an uninitialised Sentry hub.

  • The main process collected at Application.onCreate, but main-process sentry-android only comes up when JS-side Sentry.init runs (manifest sets io.sentry.auto-init=false) — so main-process exit records were dropped on essentially every cold start.
  • The FGS ran the collector even with diagnosticsEnabled=false, consuming records a user opting in later could never see.

Fix: collection no-ops (leaving records pending) until Sentry.isEnabled(), the main process waits up to 2 minutes for the JS-triggered init, and the high-water mark advances only after the capture loop (at-least-once; duplicates absorbed by the stable message grouping). Both callers snapshot the previous session's anchors (AnchorSnapshot) before stamping their own, so collection can run arbitrarily late without reading current-run values.

comapeo.fgs.killed_in_background was systematically false in the dominant kill→relaunch flow: the relaunch's replayed ON_START zeroed backgrounded_at before the FGS (started only after foregrounding) read it. Fix: ON_START now stamps a paired foregrounded_at anchor instead of clearing; the decoder treats an exit as in-background when the last stamp before its timestamp was a background. Neither stamp is ever cleared, so the answer is correct however late collection runs.

Cross-process SharedPreferences clobbering. Both processes wrote the same prefs file, and SharedPreferences rewrites the whole file from a per-process cache — one process's write could revert or erase the other's anchors, high-water marks, or even the Sentry toggles. Fix: one anchors file per process (com.comapeo.core.anchors.<proc>) with exactly one writer each; the FGS reads the main file read-only. Writes use commit=true so the backgrounded_at stamp survives a SIGKILL right after backgrounding (the exact window this feature measures). Note: the file move resets the high-water mark once — the first run on this code re-initialises it and emits nothing, then reporting resumes.

Shared record window. getHistoricalProcessExitReasons(pkg, 0, 10) returns the newest 10 records across all processes of the package, so FGS churn (START_STICKY restart loops) could evict main-process death records before they were ever seen. Fix: query without a cap (the OS bounds retention at ~16 per package) and keep the newest 10 per process after filtering.

Smaller cleanups: ExitReasonTags.levelFor returns SentryLevel directly (was a string re-parsed by a second when-block that silently degraded unknown values to INFO), and the FGS process name is derived at runtime via a shared helper instead of a literal copy of the manifest's android:process — a rename can no longer silently break the record filter.

iOS

  • Double-counted foreground OOM/watchdog deaths: sentry-cocoa's watchdog-termination tracking (enabled by default) already reports these as errors, so the foreground memory_resource_limit / app_watchdog MetricKit buckets are demoted to warning — same policy the crash buckets already followed. Background buckets stay error (the SDK heuristic only covers foreground deaths).
  • Unbounded event bursts: app-usage-tier per-exit duplication is now capped at 10 events per window+bucket — counts are 24h cumulative totals and benign buckets (background normal_app_exit) reach the hundreds, burning quota in a burst on every daily payload. window_count keeps the exact count; query sum(window_count) when exactness matters.

Docs (docs/sentry-integration.md §7.5.1/§7.5.2) updated to match.

Tests

  • Android JVM: 27 pass, including new coverage for the high-water return-don't-persist contract, the per-process cap, and the fg/bg-stamp-vs-exit-timestamp ordering (incl. the relaunch regression case).
  • iOS package: 103 pass, including new coverage for the per-cohort level split and the duplication cap.

🤖 Generated with Claude Code

…bering

Post-merge review fixes for the Phase 6/7a exit-reason telemetry:

Android
- Gate collection on Sentry.isEnabled() and advance the high-water mark
  only after capture: records were being consumed by no-op captures —
  the main process collected before the JS-triggered Sentry.init on
  every cold start, and the FGS collected with diagnostics off.
- Main process now waits (up to 2 min) for JS-side Sentry init; anchors
  are snapshotted before the current run stamps its own, so collection
  can run late without reading this-session values.
- Replace the backgrounded_at clear-on-foreground with a paired
  foregrounded_at stamp ordered against the exit timestamp: the relaunch
  used to zero the anchor before the FGS collected, making
  comapeo.fgs.killed_in_background systematically "false" in the
  kill-then-relaunch flow.
- Per-process anchor prefs files with a single writer each (SharedPrefs
  is not multi-process safe; main and FGS were clobbering each other's
  keys in the shared file), written with commit=true so the
  backgrounded_at stamp survives an immediate SIGKILL.
- Query exit records without a maxNum cap (the pid=0 window is
  package-wide, so a per-call cap let one process's churn evict the
  other's records) and cap to the newest 10 per process after filtering.
- ExitReasonTags.levelFor returns SentryLevel directly, dropping the
  string round-trip re-parsed in capture().
- Derive the FGS process name at runtime instead of a literal copy of
  the manifest's android:process.

iOS
- Demote foreground memory_resource_limit/app_watchdog MetricKit buckets
  to warning: sentry-cocoa's watchdog-termination tracking (enabled by
  default) already captures those deaths as errors, so kill-rate
  dashboards double-counted them.
- Cap app-usage-tier per-exit duplication at 10 events per window+bucket
  (counts are 24h cumulative totals; benign buckets reach the hundreds);
  window_count keeps the exact count.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@gmaclennan gmaclennan enabled auto-merge (squash) June 10, 2026 11:13
@gmaclennan gmaclennan merged commit da33aef into main Jun 10, 2026
7 checks passed
@gmaclennan gmaclennan deleted the fix/exit-telemetry-review branch June 10, 2026 11:19
gmaclennan added a commit that referenced this pull request Jun 10, 2026
Carries the post-merge review fixes (PR #84) into the v8 metrics
migration. Conflict resolution notes:

- ExitReasonsCollector keeps the v8 metrics emission (comapeo.app.exit
  counts, exit.severity attribute, diagnostic-tier buckets) on top of
  the fix branch's structure: anchor snapshot, Sentry.isEnabled gate,
  high-water mark advanced only after capture, uncapped package-wide
  query with newest-10-per-process cap, and the
  backgrounded_at/foregrounded_at ordering logic.
- ExitReasonTags keeps v8's severityFor (string attribute); the fix
  branch's typed-SentryLevel change is superseded — metrics have no
  event level.
- iOS keeps v8's one-metric-per-bucket emission (the fix branch's
  per-event duplication cap is moot) and ports the cohort-aware
  severity demotion: foreground memory_resource_limit/app_watchdog are
  "warning" because sentry-cocoa's watchdog-termination tracking
  already reports those deaths.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
gmaclennan added a commit that referenced this pull request Jun 10, 2026
* origin/main:
  fix(sentry): make exit telemetry lossless and stop cross-process clobbering (#84)
  feat(sentry): land Phases 6 + 7a — Android exit reasons & iOS MetricKit app-exit telemetry (#72)
  chore(build): use npm list instead of custom traversal to get native module versions (#70)
gmaclennan added a commit that referenced this pull request Jun 11, 2026
* main:
  chore(e2e): add e2e tests on browserstack via Maestro (#56)
  fix(sentry): make exit telemetry lossless and stop cross-process clobbering (#84)
gmaclennan added a commit that referenced this pull request Jun 16, 2026
* origin/main:
  chore(e2e): add e2e tests on browserstack via Maestro (#56)
  fix(sentry): make exit telemetry lossless and stop cross-process clobbering (#84)
  feat(sentry): land Phases 6 + 7a — Android exit reasons & iOS MetricKit app-exit telemetry (#72)
  chore(build): use npm list instead of custom traversal to get native module versions (#70)
gmaclennan added a commit that referenced this pull request Jun 22, 2026
## Optic Release Automation

This **draft** PR is opened by Github action
[optic-release-automation-action](https://github.com/nearform-actions/optic-release-automation-action).

A new **draft** GitHub release
[v1.0.0-pre.2](https://github.com/digidem/comapeo-core-react-native/releases/tag/untagged-c499977757c9745e56b2)
has been created.

Release author: @gmaclennan

#### If you want to go ahead with the release, please merge this PR.
When you merge:

- The GitHub release will be published

- The npm package with tag pre will be published according to the
publishing rules you have configured



- No major or minor tags will be updated as configured


#### If you close the PR

- The new draft release will be deleted and nothing will change

## What's Changed
* Android Testing Infrastructure & Bug Fixes by @gmaclennan in
#3
* chore: prebuild example/android; harden instrumented tests by
@gmaclennan in
#10
* Integrate @comapeo/core via IPC over Unix sockets by @gmaclennan in
#5
* chore: adjust repo setup by @achou11 in
#12
* chore: minor fixes based on expo-doctor by @achou11 in
#13
* Add iOS support & test infrastructure by @gmaclennan in
#6
* chore: add architecture docs & plans by @gmaclennan in
#11
* update some native deps used in backend by @achou11 in
#14
* iOS Phase 1: unified JS bundle + smoke test (simulator-only) by
@gmaclennan in
#15
* iOS Phase 2: xcframework Embed & Sign for native addons by @gmaclennan
in #16
* Phase 2 Android: jniLibs packaging + unified rollup loader plugin by
@gmaclennan in
#17
* chore: post-Phase-2 cleanup — comments, plan docs, agents.md by
@gmaclennan in
#33
* android: read abiFilters from reactNativeArchitectures (#30) by
@gmaclennan in
#35
* refactor: simplify build-backend.ts; rollup writes directly to native
asset trees by @gmaclennan in
#34
* chore: fix eslint configuration by @achou11 in
#41
* android: audit 16 KB page alignment on every shipped .so by
@gmaclennan in
#43
* Add rootkey persistence and lifecycle state management by @gmaclennan
in #36
* chore: move example app into apps directory by @achou11 in
#18
* refactor: per-component lifecycle state with derived ComapeoState by
@gmaclennan in
#47
* android: fold waitForFile into connect retry loop by @gmaclennan in
#52
* chore: add e2e testing app by @achou11 in
#49
* fix(android): drop setUnlockedDeviceRequired from rootkey wrapper key
by @gmaclennan in
#57
* fix(backend): cache stopping/error frames for late joiners by
@gmaclennan in
#58
* fix(ios-tests): wait for STOPPING before signalling node exit by
@gmaclennan in
#59
* fix(android): drain JNI stdio pumps before returning from node::Start
by @gmaclennan in
#60
* Sentry integration: Phase 1 + Phase 2a + Phase 2b by @gmaclennan in
#54
* feat(backend): polywasm-backed undici on iOS, re-enable maps plugin by
@gmaclennan in
#62
* ci: drop unreliable Android emulator snapshot caching by @gmaclennan
in #64
* feat(sentry): land Phase 3 — backend loader + RPC tracing by
@gmaclennan in
#63
* fix(ios-tests): serialise STOPPING/STOPPED observers in
testFullLifecycleStateTransitions by @gmaclennan in
#71
* use npm list instead of custom traversal to get native module versions
by @achou11 in
#70
* feat(sentry): land Phases 6 + 7a — Android exit reasons & iOS
MetricKit app-exit telemetry by @gmaclennan in
#72
* fix(sentry): make exit telemetry lossless and stop cross-process
clobbering by @gmaclennan in
#84
* chore(e2e): add e2e tests on browserstack via Maestro by @achou11 in
#56
* feat(sentry): migrate to @sentry/react-native v8; exit telemetry as
Application Metrics by @gmaclennan in
#73
* Map server integration by @gmaclennan in
#86
* chore(deps): upgrade to Expo SDK 56 (React Native 0.85) by @gmaclennan
in #87
* chore(ci): add release workflow by @gmaclennan in
#90
* chore: fix npm script and release build script by @gmaclennan in
#91
* chore(pack): don't try to package build files by @gmaclennan in
#92
* fix: start fastify listening by @gmaclennan in
#93
* perf(backend): switch bundler from rollup to rolldown by @gmaclennan
in #94
* fix(ci): ignore-scripts in ios npm installs by @gmaclennan in
#96
* fix(ci): replace --ignore-scripts with npm strict-allow-scripts
allowlist by @gmaclennan in
#106
* feat(config): let the consuming app supply the default project config
by @gmaclennan in
#95
* chore(release): merge prerelease branch. by @gmaclennan in
#110

## New Contributors
* @achou11 made their first contribution in
#12

**Full Changelog**:
https://github.com/digidem/comapeo-core-react-native/commits/v1.0.0-pre.2

<!--

<release-meta>{"id":342868678,"version":"v1.0.0-pre.2","npmTag":"pre","opticUrl":"https://optic-zf3votdk5a-ew.a.run.app/api/generate/"}</release-meta>
-->
@gmaclennan gmaclennan added the fix Bug fix (changelog) label Jun 22, 2026
gmaclennan added a commit that referenced this pull request Jun 22, 2026
## Optic Release Automation

This **draft** PR is opened by Github action
[optic-release-automation-action](https://github.com/nearform-actions/optic-release-automation-action).

A new **draft** GitHub release
[v1.0.0-pre.2](https://github.com/digidem/comapeo-core-react-native/releases/tag/untagged-352a6c41c12fd02dec37)
has been created.

Release author: @gmaclennan

#### If you want to go ahead with the release, please merge this PR.
When you merge:

- The GitHub release will be published

- The npm package with tag pre will be published according to the
publishing rules you have configured



- No major or minor tags will be updated as configured


#### If you close the PR

- The new draft release will be deleted and nothing will change

<!-- Release notes generated using configuration in .github/release.yml
at 7fe80b4 -->

## What's Changed
### 🚀 Features
* Integrate @comapeo/core via IPC over Unix sockets by @gmaclennan in
#5
* Add iOS support & test infrastructure by @gmaclennan in
#6
* iOS Phase 1: unified JS bundle + smoke test (simulator-only) by
@gmaclennan in
#15
* iOS Phase 2: xcframework Embed & Sign for native addons by @gmaclennan
in #16
* Phase 2 Android: jniLibs packaging + unified rollup loader plugin by
@gmaclennan in
#17
* android: read abiFilters from reactNativeArchitectures (#30) by
@gmaclennan in
#35
* Add rootkey persistence and lifecycle state management by @gmaclennan
in #36
* Sentry integration: Phase 1 + Phase 2a + Phase 2b by @gmaclennan in
#54
* feat(backend): polywasm-backed undici on iOS, re-enable maps plugin by
@gmaclennan in
#62
* feat(sentry): land Phase 3 — backend loader + RPC tracing by
@gmaclennan in
#63
* feat(sentry): land Phases 6 + 7a — Android exit reasons & iOS
MetricKit app-exit telemetry by @gmaclennan in
#72
* feat(sentry): migrate to @sentry/react-native v8; exit telemetry as
Application Metrics by @gmaclennan in
#73
* Map server integration by @gmaclennan in
#86
* feat(config): let the consuming app supply the default project config
by @gmaclennan in
#95
### 🐛 Bug Fixes
* fix(android): drop setUnlockedDeviceRequired from rootkey wrapper key
by @gmaclennan in
#57
* fix(backend): cache stopping/error frames for late joiners by
@gmaclennan in
#58
* fix(ios-tests): wait for STOPPING before signalling node exit by
@gmaclennan in
#59
* fix(android): drain JNI stdio pumps before returning from node::Start
by @gmaclennan in
#60
* fix(ios-tests): serialise STOPPING/STOPPED observers in
testFullLifecycleStateTransitions by @gmaclennan in
#71
* fix(sentry): make exit telemetry lossless and stop cross-process
clobbering by @gmaclennan in
#84
* fix: start fastify listening by @gmaclennan in
#93
* fix(ci): ignore-scripts in ios npm installs by @gmaclennan in
#96
* fix(ci): replace --ignore-scripts with npm strict-allow-scripts
allowlist by @gmaclennan in
#106
* fix(release): stop `npm pack --dry-run` leaking dry-run into backend
install by @gmaclennan in
#129
### ⚡ Performance
* perf(backend): switch bundler from rollup to rolldown by @gmaclennan
in #94
### ⬆️ Dependencies
* update some native deps used in backend by @achou11 in
#14
* chore(deps): upgrade to Expo SDK 56 (React Native 0.85) by @gmaclennan
in #87
### 🏗️ Maintenance
* Android Testing Infrastructure & Bug Fixes by @gmaclennan in
#3
* chore: prebuild example/android; harden instrumented tests by
@gmaclennan in
#10
* chore: adjust repo setup by @achou11 in
#12
* chore: minor fixes based on expo-doctor by @achou11 in
#13
* chore: add architecture docs & plans by @gmaclennan in
#11
* chore: post-Phase-2 cleanup — comments, plan docs, agents.md by
@gmaclennan in
#33
* refactor: simplify build-backend.ts; rollup writes directly to native
asset trees by @gmaclennan in
#34
* chore: fix eslint configuration by @achou11 in
#41
* android: audit 16 KB page alignment on every shipped .so by
@gmaclennan in
#43
* chore: move example app into apps directory by @achou11 in
#18
* refactor: per-component lifecycle state with derived ComapeoState by
@gmaclennan in
#47
* android: fold waitForFile into connect retry loop by @gmaclennan in
#52
* chore: add e2e testing app by @achou11 in
#49
* ci: drop unreliable Android emulator snapshot caching by @gmaclennan
in #64
* use npm list instead of custom traversal to get native module versions
by @achou11 in
#70
* chore(e2e): add e2e tests on browserstack via Maestro by @achou11 in
#56
* chore(ci): add release workflow by @gmaclennan in
#90
* chore: fix npm script and release build script by @gmaclennan in
#91
* chore(pack): don't try to package build files by @gmaclennan in
#92
* chore(release): merge prerelease branch. by @gmaclennan in
#110
* ci(e2e): retry BrowserStack builds on infra-class flakes by
@gmaclennan in
#113
### Other Changes
* ci: derive changelog labels from PR titles + add Dependabot by
@gmaclennan in
#114

## New Contributors
* @achou11 made their first contribution in
#12
* @optic-release-automation[bot] made their first contribution in
#112

**Full Changelog**:
https://github.com/digidem/comapeo-core-react-native/commits/v1.0.0-pre.2

<!--

<release-meta>{"id":342970724,"version":"v1.0.0-pre.2","npmTag":"pre","opticUrl":"https://optic-zf3votdk5a-ew.a.run.app/api/generate/"}</release-meta>
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Bug fix (changelog)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant