feat(sentry): Phase 11 toggle rework, metrics layer + PII scrubbers#111
feat(sentry): Phase 11 toggle rework, metrics layer + PII scrubbers#111gmaclennan wants to merge 9 commits into
Conversation
Implements the Sentry Phase 11 workblock (plan §11, §9b.1, §9b.5): #75 toggle rework - Rename captureApplicationData -> applicationUsageData across the JS API, native bridge methods, on-device stored key, Expo plugin field, and Node CLI flags. Deprecated aliases forward for one minor release; a one-shot stored-key migration runs on first prefs open (both platforms). - New `debug` toggle (get/setDebugEnabled, sentry.debug + sentry.debugEnabledAtMs slots, --debug argv, debugDefault plugin field) with a 24h auto-off enforced in the native debug reader. - Device classification (DeviceTags.{kt,swift}): low/mid/high by RAM + cores plus <platform>.<major>, plumbed to RN via sentryConfig.deviceTags and to Node via --deviceClass/--osMajor/ --platformTag. - tracesSampleRate now derives from debug (1.0/0). #76 metrics layer - New backend/lib/metrics.js + src/sentry-metrics.ts: wrappers that inject `platform` on every metric, attach device tags only on the .by_device mirrors, no-op when Sentry is off, and drop forbidden tags. RPC hooks split: always record the metric, span only under debug (recorded while active so it links to the trace). console integration moved behind debug; 60s memory/event-loop sampler, boot-phase durations, state transitions, storage-size bucket. #77 PII scrubbers - Shared regex list (rootKey, 22-char base64, lat/lng) in src/sentry-scrub.ts (RN) and backend/before-send.js (Node). RN wires the real beforeSend ahead of the host chain plus a host-only URL beforeBreadcrumb; Node registers the same scrub as an event processor. Walks message, exception, extra, contexts, breadcrumb message+data, span description+data; HTTP breadcrumb URLs reduce to host-only. Tests: backend metrics/before-send suites, extended sentry suites (debug branching + scrubber), native migration/24h-auto-off/device boundary tests. npm lint, tsc, jest (18), backend node --test (47) all green locally. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Scrubber (src/sentry-scrub.ts + backend/before-send.js, kept mirrored): - Redact numeric/string lat/lng held as object fields (key-aware). - Match base64url runs of 22+ chars so longer project keys/ids scrub. - RN scrubEvent now reduces breadcrumb URLs to host-only via scrubBreadcrumb (symmetric with Node). - Walk event.request (url→host, query_string/headers/cookies/data). Metrics: - Emit comapeo.rpc.client.send_ms on the always-on and debug paths. Native (review-only, not compiled here): - Android/iOS: read debug toggle before native Sentry init so the §11.5 auto_disabled breadcrumb is queued before the drain. - Document the main-process auto-off breadcrumb drop on Android. Tests: - New RN suites: sentry-scrub, sentry-metrics, rpc-client-hook. - Backend: real forbidden-NAME integration test through the wrappers; lat/lng-object, base64 43/52, and event.request regressions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 11 review fixes —
|
…hase11 * origin/main: (92 commits) docs(readme): document uploading sourcemaps, dSYMs, and native symbols to Sentry feat(e2e): forward RPC traces to Sentry, "e2e" environment for both test apps fix(e2e): drop tsconfig paths alias that double-loaded the module Release v1.0.0-pre.6 fix(android): wake blocked read in disconnect() and tidy close() teardown fix(android): shutdown IPC socket on JS reload so backend cleans up subscriptions docs: vendor relevant Expo skills into .claude/skills/ docs: drop stale issue list from AGENTS.md, fix web platform status docs: slim AGENTS.md to orientation, defer detail to docs/ docs: run local e2e against a Release build, drop the dev-client flow chore: add build:ios to match build:android in both test apps chore(deps): bump the minor-and-patch group across 1 directory with 14 updates docs: rename agents.md to AGENTS.md, add CLAUDE.md, and ship agent skills ci: align dependabot and action pinning with policy chore: add dev setup/test scripts and a Maestro e2e skill fix(android): stop dangling argv pointers from corrupting node args test(e2e): Maestro rootkey persistence across app restarts chore: keep package version in sync with origin/main fix(ios): ad-hoc-sign simulator tests with a keychain entitlement Release v1.0.0-pre.5 ... # Conflicts: # android/src/main/java/com/comapeo/core/ComapeoCoreService.kt # src/ComapeoCoreModule.ts
…rt cycle The post-merge ComapeoCoreModule imports the Phase 11 metrics layer, which eagerly pulls ./sentry and forms an import cycle back into this module. Stub ./sentry-metrics like the other transitive deps so the notification-wrapper test loads the real module without tripping the cycle's eager readSentryConfig() call.
sampleStorageSize() recursively stats the entire private storage tree at boot, but its only output is a bucket metric that no-ops when Sentry is disabled. Gate the walk on metrics.isEnabled() so opted-out users don't pay the disk I/O for a discarded sample.
Code review (xhigh, workflow-backed)I merged I've already pushed two fixes that needed no decision:
The rest below need your call or touch the PII scrubber, so I'm flagging rather than editing. 1. RPC error capture is now gated behind
|
captureException had moved into the debug-gated span path, so with debug off (the default, and it auto-expires after 24h) a throwing RPC handler produced no Sentry issue — only duration/error counters. Move capture to the always-on path in both the backend rpcHook and the RN client hook, gated only on Sentry being initialised. Add the missing trailing .catch on the backend chain so a throw from a metrics call can't escalate to unhandledRejection -> handleFatal -> process.exit.
The 22+-char URL-safe-base64 token rule over-matched Sentry's own identifiers: it redacted contexts.trace.trace_id (breaking crash-to-trace correlation and risking event rejection), collapsed long PascalCase exception type names to [redacted] (breaking issue grouping), and dropped metrics whose error_class tag was a long error name. Remove it from both the RN scrubber and the backend mirror, leaving the targeted root_key= and lat/lng rules. Bare tokens are unscrubbed until the team agrees a narrower rule; documented in-code and in the tests.
|
Update — findings addressed (pushed):
Still open for the team: #4 (duplicated scrubber/metrics modules), #10 (the metric-wrapper dedup), #9 (Android cross-process breadcrumb, already documented as deferred), and the eventual narrower token-scrub rule. |
The RPC hooks now observe errors for metrics + tracing only; neither the backend rpcHook nor the RN client hook calls captureException. An RPC rejection is often expected control flow (e.g. NotFound) that should not auto-create a Sentry issue — whether an error is worth reporting is the calling application's decision, made at the call site. The response and its rejection reach the JS caller through rpc-reflector's own channel independently of the hook, so not capturing here changes nothing about propagation. Genuine backend fatals are still caught by the global uncaughtException/unhandledRejection handlers (captureFatal).
|
Follow-up on the earlier review thread: RPC error capture — removed from the hooks entirely. Tracing of the actual flow confirmed the RN client hook sees every RPC failure (handler errors come back deserialized with the backend stack preserved, plus client-only Storage-size metric — the recursive-walk concern is filed as #178 (replace the JS walk with Android Still open for the team: the duplicated scrubber/metrics modules, the metric-wrapper dedup, and the eventual narrower token-scrub rule (broad base64-22 rule remains disabled). |
Implements the Sentry Phase 11 workblock (tracking #74) in one branch / one PR. Spec:
docs/sentry-integration-plan.md§11.1–§11.8, §9b.1, §9b.5, Phase 5.Part of #74 (tracking umbrella — remaining children #78–#83 stay open)
Closes #75
Closes #76
Closes #77
#75 — Toggle rework (§11.1, §11.5, §11.6, §11.7, §11.2.b)
captureApplicationData→applicationUsageDataeverywhere: JS get/set, native bridge methods, the on-device stored key, the Expo plugin field, and the Node CLI flags.get/setCaptureApplicationData, nativesetCaptureApplicationData,--captureApplicationDataargv, plugincaptureApplicationDataDefaultwith a warn-don't-error path).debugtoggle:get/setDebugEnabled(JS),setDebugEnabled(native),sentry.debug+sentry.debugEnabledAtMsslots,--debugargv,debugDefaultplugin field.debug=falseif switched on >24h ago and queues acomapeo.debug.auto_disabledbreadcrumb; re-enable refreshes the window. Both platforms.DeviceTags.{kt,swift}bucket the device low/mid/high by RAM + cores and compute<platform>.<major>. Passed to JS viasentryConfig.deviceTagsand to Node via--deviceClass/--osMajor/--platformTag.tracesSampleRatenow derived fromdebug(1.0/0).#76 — Metrics layer (§11.2, §11.3, §11.6, §11.8)
backend/lib/metrics.js+src/sentry-metrics.ts: wrappers aroundSentry.metrics.*injectingplatformon every metric + units, attachingdevice_class/os_majoronly on the.by_devicemirrors, no-op when Sentry is off.debug, recording the metric while the span is active so it links to the trace (§11.3).src/ComapeoCoreModule.ts(onRequestHook) +backend/lib/sentry.js(rpcHook).tracesSampleRatedriven bydebug; backendconsoleIntegrationmoved behinddebug.beforeSendMetricforbidden-tag filter on the JS + Node metrics paths (shared regex list). Native SDKs: see deferral note.#77 — PII scrubbers (§9b.1, §9b.5, Phase 5)
src/sentry-scrub.ts) and Node (backend/before-send.js), each pointing at the other.beforeSendbefore the host's chain; URL-scrubbingbeforeBreadcrumb.backend/before-send.jsregistered as aSentry.addEventProcessorinlib/sentry.js's init.Scrubber false-positive trade-off
The substring scrubber over-redacts by design — leaking a real project secret costs far more than a stray
[redacted]:aGVsbG8td29ybGQtMTIzNA→[redacted](real rootkey shape)bm90LWEtcmVhbC1rZXktMQ→[redacted](harmless, false positive)latitude: -12.34→latitude: [redacted]; evenlongitude: unknown-style prose loses its value.https://cloud.comapeo.app/projects/abc?token=x→https://cloud.comapeo.app— path/query detail lost, "all requests to host X failing" diagnosability kept.The same patterns gate the metrics
beforeSendMetricfilter (forbidden tag names + base64-22 / lat-lng tag values).Checks run locally
npm run lint— pass (0 errors, 0 warnings)npx tsc --noEmit(src) — passnpm test(jest, src) — pass (18/18)npm --prefix backend ci --ignore-scriptsthennode --test 'lib/**/*.test.mjs'— pass (47/47)rolldownbuild — pass; verifiedloader.mjskeeps@sentry/node-corebehind the gated dynamic import.Could NOT run locally (rely on CI)
ComapeoPrefsTest,SentryConfigTest, newDeviceTagsTest) — no Android SDK / Gradle here. Cover the stored-key migration, 24h auto-off boundaries, and device-classification boundaries (exactly 3 GB RAM, exactly 4 cores). Need the CI emulator/Gradle to execute.ComapeoPrefsTests,SentryConfigTests, newDeviceTagsTests) — no Xcode here. Same coverage. Need CI/Xcode to execute.backend npm run typesis red onmainalready (pre-existingpolywasmdeclaration gap, unrelated); the new production files type-clean.Deferred (called out explicitly)
beforeSendMetricon the native (Android/iOS) SDKs is not yet wired — only the JS + Node metric paths run the forbidden-tag filter today. The native exit metric is the only native metric currently emitted and is already low-cardinality by construction; the native hook is a small follow-up.metrics.jshelpers exist; sync-session lifecycle plumbing is a Phase 5 item). The always-on RPC, boot, memory, state, and storage metrics are wired end-to-end.🤖 Generated with Claude Code