Add rootkey persistence and lifecycle state management#36
Merged
Conversation
Drops the hardcoded rootkey from createComapeo and reshapes the boot
sequence so native (Kotlin/Swift) supplies it over the existing control
socket as the first frame, before the manager is constructed:
- Backend binds control.sock first, broadcasts {type:"started"}, waits
for {type:"init",rootKey:<base64>}, builds MapeoManager, binds
comapeo.sock, broadcasts {type:"ready"}.
- The 1s settle window before `ready` is removed; `ready` now means RPC
is safe to call (manager exists, comapeo socket bound).
- Native folds the whole handshake into its `starting` state. `started`
fires only on receipt of `ready`.
Adds RootKeyStore on both platforms (Keychain on iOS,
AndroidKeyStore-wrapped SharedPreferences on Android), greenfield only —
fresh-install generate-and-persist with read-on-error rather than
silent regeneration. SharedPreferences-backed file is excluded from
backup via data_extraction_rules.xml + backup_rules.xml.
Closes #29: Android NodeJSService gains a State enum mirroring iOS,
ComapeoCoreModule observes the control socket from the main process to
derive a JS-visible state, and the TS layer exports a `state` singleton
with getState() and "stateChange" events alongside the existing
`comapeo` MessagePort.
Tests: a new MockBackend helper drives the started → init → ready
handshake against a mock control socket so existing service-lifecycle
tests still observe `.started` without spinning a real Keychain.
https://claude.ai/code/session_01WEzwFdqnPVbcdWgMbX6bjE
…ement-plan-o1vKT # Conflicts: # android/src/main/AndroidManifest.xml # backend/index.js # backend/lib/simple-rpc.js
Three independent compile errors surfaced once compileDebugKotlin ran
in CI:
- RootKeyStore.kt: SharedPreferences.edit(commit = true) { ... }
returns Unit, not the Boolean from commit(). Switched to the raw
editor (edit().putString(...).commit()) so the persistence failure
can actually be observed and surfaced as a RootKeyException.
- NodeJSService.kt / ComapeoCoreModule.kt: the trailing-lambda call
NodeJSIPC(file) { ... } bound to the optional last param
(onConnectionStateChange: ((State) -> Unit)?) instead of
onMessage: (String) -> Unit. Use named-argument calls so the
(String) -> Unit lambda binds to onMessage.
- Drop the now-unused androidx.core.content.edit import from
RootKeyStore.kt.
https://claude.ai/code/session_01WEzwFdqnPVbcdWgMbX6bjE
Two distinct CI failures from the same trailing-lambda mistake plus a
Swift 5 trailing-comma:
- android/src/androidTest/java/com/comapeo/core/NodeJSIPCTest.kt has
~9 callsites of the form `NodeJSIPC(file) { msg -> ... }`. The new
optional `onConnectionStateChange` constructor parameter was last,
so Kotlin's trailing-lambda rule bound those callsites to the state
callback (wrong type), not `onMessage`. Reorder NodeJSIPC's
constructor so `onConnectionStateChange` (defaulted) comes before
`onMessage` — trailing-lambda then binds to `onMessage` again and
every existing call compiles unchanged.
- ios/RootKeyStore.swift had a trailing comma after the last argument
of SecRandomCopyBytes. Trailing commas in argument lists are Swift
6.0+ (SE-0257); the macOS swift-test runner uses Xcode 15.4 / Swift
5.10, which rejects them. Drop the comma. Also exclude
RootKeyStore.swift from the swift package target explicitly so SPM
doesn't surprise us by compiling it (production CocoaPods build
picks it up via `s.source_files = "*.swift"`).
- Refresh CoreManagerSmokeTest.swift docstring to reflect the new
bind-control → init → bind-comapeo → ready sequence (previously
described as "both listen() promises resolve").
https://claude.ai/code/session_01WEzwFdqnPVbcdWgMbX6bjE
Swift Package Tests and Integration Tests are failing with no visible error in the GitHub Actions web UI annotations panel — the "Process completed with exit code 1/65" tells us nothing about which test failed or what the assertion was. Add `tee` + a failure-only print step so the next CI run surfaces the actual error in the step output, plus an artifact upload so we can pull the full log. To revert once root-caused, drop the diagnostic steps and the `--verbose | tee` redirection. https://claude.ai/code/session_01WEzwFdqnPVbcdWgMbX6bjE
ComapeoCoreModuleTests.testStateStringDerivesFromServiceArgumentNotIPC
was constructing NodeJSService without the new rootKeyProvider param,
which fails the test target compile under xcodebuild test (exit 65).
Two changes:
- Pass `rootKeyProvider: { Data(repeating: 0xAB, count: 16) }` to the
init so the example-app test target compiles.
- Switch the assertion from `.started` to `.starting`. The `.started`
transition now requires a control-socket handshake (backend sends
`started` → service replies with init frame → backend sends `ready`)
that the example-app test target has no scaffold for. The contract
under test is "stateString reflects service.state.rawValue" — the
choice of state value is incidental.
Also simplify mockTestRootKey from sequence-init to
`Data(repeating:count:)` for clarity.
https://claude.ai/code/session_01WEzwFdqnPVbcdWgMbX6bjE
CI is green now (Swift Package Tests + Integration Tests + iOS Device Build all pass on commit 14a3590). Drop the verbose-tee + dump-on- failure scaffolding added in f729f1c — it served its purpose. https://claude.ai/code/session_01WEzwFdqnPVbcdWgMbX6bjE
Integration Tests passed on 14a3590 but failed on 34aba84 (revert) in 15 minutes — typical xcodebuild timeout shape. Re-add the failure dump until we have a reproducible signal. https://claude.ai/code/session_01WEzwFdqnPVbcdWgMbX6bjE
639a82a (with diagnostic) passed; the 34aba84 failure was almost certainly a flake — same Swift code, only the workflow differed. Drop the failure dump now that the run is reproducible. If the flake recurs we have the diagnostic snippet in git history and can re-add it for one-shot debugging. https://claude.ai/code/session_01WEzwFdqnPVbcdWgMbX6bjE
Re-add the `tee xcodebuild-test.log` redirection plus the failure-only summary dump. We tested both with and without it on identical Swift code; the runs without `tee` intermittently timed out at the workflow's 15-min mark while runs with `tee` completed in ~4 min. The redirection seems to affect xcodebuild's pipe buffering or simulator boot-sequence timing in a way that suppresses the race; cause unknown, but the workaround is stable and the failure-dump pays for itself the next time something else breaks. https://claude.ai/code/session_01WEzwFdqnPVbcdWgMbX6bjE
Substring matching on the raw frame bytes (`message.contains("\"started\"")`)
is brittle: any control payload that legitimately carries those characters
in another field would falsely trigger a state transition. The "stay
synchronous" rationale in the prior comment doesn't hold either — the
handler already runs on the IPC's receive queue/dispatcher and the
init-frame send is async via `serviceScope.launch` / sendQueue, so a real
parser costs nothing in latency or ordering.
Switches both NodeJSService.kt (FGS-side) and ComapeoCoreModule.kt
(main-app-side) on Android, plus NodeJSService.swift on iOS, to
JSONObject.optString("type") / JSONSerialization.jsonObject. Non-JSON
frames are logged-and-skipped rather than mis-routed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The envelope JSON carries an `alias` field for forward-compat with future wrapper-key rotation, but parseEnvelope previously read it without checking it against the constant the code actually uses. A tampered or mis-versioned envelope whose `v=1` happens to be correct but whose alias points at a non-existent key would have hit a confusing "wrapper key alias missing" error several call frames later. Pin the check at parse time so the failure mode names the right field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hrow
Adds SimpleRpcServer.broadcastError({phase, message, stack?}) so the host
can publish a structured error frame to every connected control-socket
client before the process exits. Native-side consumers (FGS on Android,
NodeJSService on iOS) will use this to derive their ERROR state without
needing a separate error channel.
backend/index.js gains:
- handleFatal(phase, error): broadcasts the error frame, waits ~100ms
for the AF_UNIX kernel buffer to flush, then exits 1.
- uncaughtException + unhandledRejection handlers route through
handleFatal so any throw — sync or async, in any phase — surfaces with
the same shape.
- Boot IIFE tags errors with the phase that produced them
(`listen-control`, `init`, `construct`) before re-throwing into
handleFatal, replacing the bare `process.exit(1)`.
The 100ms flush wait is bounded but generous: AF_UNIX writes on a local
socket are kernel-buffered immediately; the wait exists for the case
where the peer hasn't drained its receive buffer yet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…idge
Pairs with the previous commit's backend broadcastError. Native (FGS-side
NodeJSService on Android, NodeJSService on iOS) now consumes
{type:"error",phase,message} on the control socket and transitions to
ERROR while capturing the detail. Local error paths (rootkey load,
node-runtime startup, shutdown timeout) route through the same
transitionToError helper so getLastError() always reflects whichever
failure landed first.
The main-process Android ComapeoCoreModule (which observes the same
broadcast independently) attaches errorPhase/errorMessage to its
stateChange event payload. iOS ComapeoCoreModule reads the captured
ErrorInfo off NodeJSService and includes the same fields.
JS surface:
- StateChangeEventPayload gains optional errorPhase/errorMessage.
- ComapeoErrorInfo type for the structured detail.
- state.getLastError() returns the most recent ErrorInfo or null.
- stateChange listener signature is now (state, error|null).
The IPC-level NodeJSIPC.State.Error is still mapped to ERROR; its
exception message is forwarded with phase="ipc" so callers can
distinguish a connection-layer failure from a backend-reported one.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous behaviour was to nodeJob?.cancel() on Android when rootkey load failed in sendInitFrame. That races with the application's own ERROR observation: if the JS layer is mid-handler reading getLastError(), the node process has already gone away and any subsequent introspection (or graceful shutdown attempt by the FGS) sees a stale state. ERROR is now treated as observable: the service transitions to ERROR and captures the cause, but leaves the node thread alive. Recovery is the application's call — restart the FGS, recreate the service, prompt the user, etc. — which matches the pattern callers already need for non- rootkey failures (backend boot errors, shutdown timeouts). iOS sendInitFrame already had no abort path, so the change there is docstring-only. State enum docs and the JS ComapeoState union are updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A hang during the rootkey handshake or MapeoManager construction
previously left the service in STARTING indefinitely — there was no
upper bound on how long the JS layer (or any UI driven off
state.getState()) might wait for a transition that would never come.
Adds a configurable watchdog (default 30 s) on both platforms:
- iOS: DispatchWorkItem scheduled via DispatchQueue.global().asyncAfter.
transitionState cancels it when leaving .starting; if it fires while
state is still .starting, the service transitions to ERROR with phase
"starting-timeout".
- Android: serviceScope.launch { delay(); … } that respects coroutine
cancellation, cancelled by transitionState when leaving STARTING.
Both use a TOCTOU-safe state recheck inside the watchdog body so a
racing transition (ready broadcast landing simultaneously) doesn't
trigger a spurious ERROR.
Default 30 s covers cold simulator boot plus the addon dlopen surge
(sodium-native + better-sqlite3 dominate) with margin matching the
existing CoreManagerSmokeTest budget. Production callers can widen
for slow devices; tests can tighten via the new init parameter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both modules call NodeJSIPC.connect() in OnCreate AND on every OnActivityEntersForeground / OnAppEntersForeground. Without context that reads as redundant or buggy. NodeJSIPC.connect() is idempotent and also the recovery path out of the IPC's .error state, so re-calling it on every foreground is the cheapest way to handle transient connection failures (FGS respawn after kill on Android, suspension-closed fd on iOS) without us mirroring IPC state at this layer. Comment-only — no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Android: instrumented test under androidTest/ exercises AndroidKeyStore + SharedPreferences directly. Two cases: first-call generates and second-call returns identical bytes (proves persistence), and a corrupt JSON envelope throws RootKeyException (proves we don't silently regenerate on read failure — the contract that prevents identity loss). @Before/@after purge both halves of the persisted state so the suite is order-independent and won't leak alias/blob across runs. iOS: example-app XCTest suite exercises the simulator's iOS Keychain. Two cases: round-trip (proves persistence under a UUID-namespaced service) and a manually-planted wrong-length entry that must throw .wrongLength (proves the contract). Each test uses a unique service string so concurrent or post-crash reruns don't collide. The iOS test sources live under example/tests/ios/ (the tracked location); example/ios/ is gitignored and regenerated by `expo prebuild`. To make new test files land in the Xcode test target on incremental prebuilds (not just --clean rebuilds), add-test-target.rb is now idempotent — it registers any *.swift file in the test source directory that isn't already wired into the build phase. Cross-restart deviceId stability is intentionally not covered here per review discussion; the upcoming Maestro end-to-end suite will exercise that path with a real CoMapeoCore boot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 29, 2026
gmaclennan
commented
Apr 29, 2026
gmaclennan
commented
Apr 29, 2026
gmaclennan
commented
Apr 29, 2026
gmaclennan
commented
Apr 29, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Implements persistent device identity (“rootkey”) storage on iOS/Android and introduces a cross-platform lifecycle/state model for the embedded Node.js backend, including a native↔backend control-socket handshake that gates backend initialization on receiving the rootkey.
Changes:
- Add secure, non-rotating 16-byte rootkey persistence on iOS (Keychain) and Android (AndroidKeyStore-wrapped AES/GCM + SharedPreferences), plus backup/transfer exclusions on Android.
- Refactor backend startup to a two-stage boot (
started→init(rootKey)→ready) and add error broadcasting over the control socket. - Expose lifecycle state + last error to JS via
getState(),getLastError(), andstateChangeevents; update iOS tests and tooling to use a newMockBackend.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/index.ts | Re-export state observer alongside comapeo. |
| src/ComapeoCoreModule.ts | Add JS state observer wrapper + getState()/getLastError() accessors. |
| src/ComapeoCore.types.ts | Define ComapeoState, error payload/event types. |
| ios/Tests/NodeJSServiceTests.swift | Update service tests to drive new handshake via MockBackend. |
| ios/Tests/IPCLifecycleTests.swift | Update IPC lifecycle tests to use MockBackend for control socket. |
| ios/Tests/Helpers/MockNodeService.swift | Inject rootKeyProvider + add helper to start service with MockBackend. |
| ios/Tests/Helpers/MockBackend.swift | New mock control-socket backend implementing started → init → ready. |
| ios/RootKeyStore.swift | New Keychain-backed persistent rootkey store. |
| ios/Package.swift | Exclude RootKeyStore.swift from SPM target used by macOS swift tests. |
| ios/NodeJSService.swift | Add lifecycle state machine, startup watchdog, init-frame send, error capture. |
| ios/ComapeoCoreModule.swift | Emit stateChange with error details; expose getLastError(). |
| ios/AppLifecycleDelegate.swift | Wire RootKeyStore.loadOrInitialize() into NodeJSService via rootKeyProvider. |
| example/tests/ios/ServiceLifecycleTest.swift | Adjust timing assumptions for new “ready-gated” startup. |
| example/tests/ios/RootKeyStoreTests.swift | New simulator Keychain tests for RootKeyStore persistence/corruption. |
| example/tests/ios/CoreManagerSmokeTest.swift | Update smoke test commentary to reflect init-gated manager construction. |
| example/tests/ios/ComapeoCoreModuleTests.swift | Update module test expectations to .starting due to handshake requirement. |
| example/plugins/with-ios-tests/add-test-target.rb | Make test target file registration idempotent across prebuild runs. |
| backend/lib/simple-rpc.js | Pass full inbound message to handlers; add broadcastError(...). |
| backend/lib/create-comapeo.js | Require a validated 16-byte rootKey Buffer (no default). |
| backend/index.js | Implement init handshake, gate manager creation on rootkey, broadcast boot errors. |
| android/src/main/res/xml/comapeo_data_extraction_rules.xml | Android 12+ backup/device-transfer exclusion for rootkey prefs. |
| android/src/main/res/xml/comapeo_backup_rules.xml | Pre-Android-12 backup exclusion for rootkey prefs. |
| android/src/main/java/com/comapeo/core/RootKeyStore.kt | New AndroidKeyStore-backed rootkey persistence with verification. |
| android/src/main/java/com/comapeo/core/NodeJSService.kt | Add lifecycle state machine, startup watchdog, init-frame send, error capture. |
| android/src/main/java/com/comapeo/core/NodeJSIPC.kt | Add optional connection-state observer callback. |
| android/src/main/java/com/comapeo/core/ComapeoCoreService.kt | Remove obsolete internal ServiceState flow (service now relies on NodeJSService state). |
| android/src/main/java/com/comapeo/core/ComapeoCoreModule.kt | Add stateChange events + getState()/getLastError() derived from control socket + IPC state. |
| android/src/main/AndroidManifest.xml | Wire backup/data-extraction exclusion rules; add tools namespace/targetApi. |
| android/src/androidTest/java/com/comapeo/core/RootKeyStoreTest.kt | New instrumented tests for Android rootkey persistence/corruption. |
| .github/workflows/ios-tests.yml | Pipe xcodebuild output to file and surface failure summary on CI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
gmaclennan
commented
Apr 29, 2026
Pulls together the design rationale that was previously spread across PR review comments and ad-hoc explanations. Covers: - iOS in-process vs Android two-process model and why. - The two Unix-domain sockets (comapeo.sock, control.sock), framing, and why we have two channels rather than one. - The boot handshake (started → init → ready) and why the rootkey crosses the socket boundary. - Lifecycle state machine: native state, JS-visible state, error surfacing across the bridge. - Known limitations: the merged state machine actually represents three independent components (FGS, node runtime, backend code), with two proposed fixes — a small "broadcast stopping" change vs a larger per-component state refactor. Both deferred to a follow-up. - Alternatives considered for FGS↔main-process state notification (Intent broadcasts, Messenger, ContentProvider, etc.) and for rootkey storage (expo-secure-store), with the rejection rationale for each. Companions the existing ForegroundService.md, root-key-storage-and- migration-plan.md, runtime-alternatives.md, and bare-architecture.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously, Android's start() guarded only on `nodeJob != null` and
stop() had no state guard at all. The state machine therefore allowed
weird transitions like ERROR -> STOPPING (when stop() was called on
an errored service) and silently re-permitted ERROR -> STARTING after
a stop() had cleared nodeJob.
Tightens both guards to match iOS:
- start() requires state == STOPPED. Refuses STARTING/STARTED
(already-running, existing nodeJob check), STOPPING (mid-shutdown),
and ERROR (terminal — caller must destroy() and create a new
instance to recover).
- stop() requires state in {STARTING, STARTED}. STOPPED/STOPPING is
a no-op; ERROR is refused — callers must call destroy() to release
resources rather than going through a STOPPING transition that
doesn't map to a clear semantic.
This codifies the model the FGS lifecycle already follows in practice:
ComapeoCoreService.onCreate creates a fresh NodeJSService for every
service start, so a recovery flow is naturally "stop the FGS, start
it again" and a brand-new instance gets STOPPED -> STARTING. iOS
already enforces these guards in code; the State enum docstring on
both platforms now spells out the per-instance-terminal contract.
No behaviour change in existing happy-path or shutdown tests because
the FGS never reused a NodeJSService instance across start/stop
cycles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reverses the lifecycle treatment of malformed control-socket frames
(non-JSON, missing `type`, unknown `type`). They were previously
raised to ERROR with phase="protocol"; that conflated "the service is
in trouble" with "we got one frame we couldn't parse". A single bad
frame shouldn't tear down a working session.
Adds a dedicated `messageerror` event mirroring the DOM MessagePort
counterpart. The event fires on the JS-facing `state` observer
alongside (but independent of) `stateChange`. Listeners receive a
single `Error` argument with a developer-facing description of the
offending frame:
state.addListener("messageerror", (error) => {
console.warn("control protocol error:", error.message)
})
A `messageerror` event does not change the lifecycle state — the
control socket continues to drive `stateChange` independently. Real
backend errors (broadcast as `{type:"error",phase,message}`) still
transition to ERROR as before; only PARSE failures use the new
channel.
Native plumbing:
- Android ComapeoCoreModule: emits `messageerror` directly from its
control-socket onMessage handler.
- iOS NodeJSService: exposes a new `onMessageError` callback;
ComapeoCoreModule.swift wires it to sendEvent.
- Android FGS-side NodeJSService: logs malformed frames (no JS
bridge to forward to); the main-app-process Module surfaces the
event independently from its own controlIpc subscription.
Architecture doc gets a new §5.5 covering the rationale and wiring.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three near-identical blocks parsed control-socket frames and switched
on `type`:
- NodeJSService.kt::handleControlMessage (FGS-side)
- ComapeoCoreModule.kt's controlIpc onMessage (main-app-side)
- NodeJSService.swift::handleControlMessage (iOS)
Each independently parsed JSON, looked up "type", and translated to
lifecycle / event effects. The well-known type names ("started",
"ready", "error") were string literals scattered across all three
files. Adding a new frame type meant updating all three sites with no
compiler help to catch a missed branch.
Adds a typed router on each platform:
- ControlFrame.kt — sealed class with Started, Ready, Error(phase,
message), Malformed(detail) cases, plus a static parse(raw): ControlFrame.
- ControlFrame.swift — Swift enum with the same cases and parse function.
Each consumer now switches on the typed result. Kotlin `when` and Swift
`switch` are exhaustive over sealed classes / enums, so a forgotten
case fails to build rather than silently dropping the frame.
`Malformed` collapses "non-JSON", "missing type", and "unknown type"
into one case with a developer-facing detail string. The consumers
don't currently distinguish these; the detail is what gets surfaced
to JS via `messageerror` regardless. If a future consumer needs to
distinguish, the case can be split without touching the call sites
that don't care.
Pure refactor — no behaviour change. ControlFrame.swift is added to
the SPM target sources alongside the existing Swift files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…shot> Replaces the explicit `synchronized(stateLock)` blocks around `state` + `lastError` reads/writes with a single `MutableStateFlow<StateSnapshot>` holding both fields together. `getAndUpdate` is a CAS-loop, so: - Concurrent transitions resolve in a defined order without a lock. - The (state, lastError) pair is always observed as a matching unit by readers — no separate two-step lock dance to make atomic. - The block returns the pre-update snapshot, so the watchdog-cancel + log + onStateChange-fire side effects can run cleanly outside the CAS without any "captured-from-inside-the-lock" trickery. The `startupWatchdogJob` slot moves from `@Volatile var Job?` to `AtomicReference<Job?>`. `cancelStartupWatchdog()` does `getAndSet(null)?.cancel()` — a single atomic operation that takes ownership of the current ref, so two concurrent transitions out of STARTING can never both try to cancel the same job. Aligns this file's pattern with `NodeJSIPC.kt` which already uses `MutableStateFlow` for its connection state. No new dependencies (the imports are already on the kotlinx-coroutines version we're pinned to). Behaviour-preserving — observable lifecycle is identical to the previous synchronized version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure deterministic tests for the control-socket frame parser extracted in 08bc266. Coverage: - Each well-known type (started, ready, error) parses to the matching case, with the right phase/message extraction for error frames. - Error frames default phase=unknown and message=(no message) when those fields are missing — the parser is permissive on the wire so downstream consumers can rely on never seeing null. - Forward-compat: extra fields (e.g. `stack`) on an error frame are ignored, not rejected. - Every malformed case (non-JSON, empty input, missing `type` field, unknown `type` value) resolves to .Malformed/.malformed with a developer-readable detail string that names what's wrong. - A 200-char non-JSON input produces a truncated detail (the parser takes the first 100 chars), so a hostile or accidentally-huge frame doesn't blow up logs. Without these, parser regressions would only surface via a lifecycle test that happens to drive the right frame and notice the wrong outcome — silent until something else broke. JVM-only on Android (android/src/test, not androidTest), pure Foundation on iOS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Initial ControlFrameTests.swift used `@testable internal import` (the Swift 5.9 explicit-access-import form), copied from the example app's test sources. The SPM `ios/Tests/` target uses the older implicit form across all sibling tests; mixing the two caused "ambiguous implicit access level for import of 'ComapeoCore'" build errors against the other test files. Switching ControlFrameTests to the implicit form aligns it with the rest of the suite and the test target now compiles + passes (`swift test --filter ControlFrameTests`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every ControlFrameTest case threw `RuntimeException` in CI:
> ControlFrameTest > parsesStarted FAILED
> java.lang.RuntimeException at ControlFrameTest.kt:21
> ... (10 failures)
`org.json` is part of the Android framework, but the JVM unit-test
classpath gets a stub where every method throws "Method ... not
mocked" at runtime. The Android Gradle plugin's default
`testOptions.unitTests.returnDefaultValues = false` makes that the
expected failure mode for any framework class with no real JVM
implementation.
Pulling the standalone `org.json:json` (Maven Central) onto the test
classpath provides a real implementation under the same package and
API. Production code that uses `JSONObject` (ControlFrame,
RootKeyStore) is now unit-testable on the JVM without instrumentation
or framework mocking.
Verified locally:
./gradlew :comapeo-core-react-native:testDebugUnitTest \
--tests com.comapeo.core.ControlFrameTest
BUILD SUCCESSFUL
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Buffer.from(s, "base64") in Node is permissive — invalid characters are silently dropped, so a tampered string could still decode to 16 bytes that aren't the bytes the producer originally encoded. The existing length check would pass, the wrong rootkey would resolve into MapeoManager, and the device would silently use a corrupted identity. Adds a strict regex check before decode: a 16-byte rootkey, base64-encoded with padding (which is what both Android's `Base64.NO_WRAP` and iOS's `base64EncodedString()` emit), is exactly 22 base64 chars followed by `==`. Anything else is rejected with a clear error before `Buffer.from` is given a chance to silently re-interpret it. The post-decode length check stays as belt-and-braces — strict regex should make it unreachable, but the cost of the extra check is one boolean compare and the cost of a bad rootKey is identity loss. Addresses Copilot review on backend/index.js:73. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The library declares android:dataExtractionRules and android:fullBackupContent on its <application> element to exclude the rootkey-bearing SharedPreferences from cloud backup and device-to- device transfer. Host apps that already declare these attributes hit a manifest-merger conflict at build time. Trade-off: the build error is intentional. Without the library declaration, forgetful consumers leak a (cryptographically useless, but private) blob to backups and produce a confusing restore UX (envelope restored, wrapper key missing on the new device, RootKeyException at first launch). The build-time error is the lesser evil; it surfaces the integration step at the right moment. Cleanest merge-friendly answer is an Expo config plugin that programmatically merges our exclusions into the host's rules XML at prebuild time — filed separately. For now: - Manifest comment expanded with the explicit two-step resolution (merge exclusion entry into host's rules XML, add `tools:replace`). - README "Configure for Android" section explains the conflict and resolution with concrete XML, plus rationale for why exclusion matters even though the wrapper key is device-bound. Addresses Copilot review on AndroidManifest.xml:32. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
The wrapper key in `createOrLoadWrapperKey()` is built with
`setRandomizedEncryptionRequired(true)`, which on hardware-backed
AndroidKeyStore implementations means the keystore — not the caller
— controls IV generation. Passing a `GCMParameterSpec` with our own
IV to `Cipher.init(ENCRYPT_MODE, ...)` throws
java.security.InvalidAlgorithmParameterException:
Caller-provided IV not permitted
at runtime. The instrumented test suite hit this on the API 30
emulator; both `RootKeyStoreTest.firstCallGenerates...` and
`corruptedEnvelopeThrows` failed because they exercise the encrypt
path. Production code on a real device with a hardware keystore
would have hit the same.
Fix: drop the caller IV on encryption. Read `cipher.iv` after
`init()` and store the keystore-generated IV in the envelope
exactly as before — load semantics unchanged. Adds a defensive
length check so an unexpectedly-shaped IV (the spec doesn't pin
the length, though 12 bytes is the GCM default and what we still
require on read) fails at write time, not silently.
Decryption keeps using `GCMParameterSpec(tagLen, envelope.iv)` —
that's required by GCM and explicitly allowed for keystore-managed
decrypt keys.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's swift-test run died with `error: Exited with unexpected signal
code 13` (SIGPIPE) during
`testStopTimeoutTransitionsToErrorNotStopped`. The test deliberately
tears down the MockBackend before calling `service.stop(timeout:0.1)`,
forcing the shutdown frame to write to an already-closed peer:
1. backend.stop() → mock server closes its accepted fd
2. service.stop() → controlIPC.sendMessageSync writes shutdown
frame on its own fd whose peer (the mock-side accept fd) is now
closed
3. Darwin.write() returns -1 with errno=EPIPE AND the kernel
sends SIGPIPE to the process
4. No handler installed → process dies, taking the rest of the
test suite with it
The bug is latent in production too: any shutdown race between the
service and the backend (the backend's `handleFatal` exit is the
clearest example) can hit the same path, and SIGPIPE there would
crash the FGS / app in the field. The existing `writeFully` already
handles `EPIPE` as `IPCError.writeError(EPIPE)`; the kernel signal
just got there first.
Fix: set `SO_NOSIGPIPE = 1` via `setsockopt` on:
- The connect()'d fd in `connectSocket` (production controlIPC and
comapeoIPC writers).
- The accept()'d fd in `MockNodeServer.acceptClient` (test side that
also writes via `sendFramedMessage`).
With SIGPIPE suppressed at the socket level, a write to a closed
peer returns `errno=EPIPE` cleanly and our existing error path
handles it.
Verified locally:
cd ios && swift test
Executed 53 tests, with 0 failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gmaclennan
added a commit
that referenced
this pull request
Apr 29, 2026
…der-804127 * origin/main: Add rootkey persistence and lifecycle state management (#36) android: audit 16 KB page alignment on every shipped .so (#43) chore: fix eslint configuration (#41) # Conflicts: # android/src/main/java/com/comapeo/core/ComapeoCoreModule.kt # backend/index.js # ios/ComapeoCoreModule.swift # src/index.ts
This was referenced Jun 22, 2026
gmaclennan
added a commit
that referenced
this pull request
Jun 22, 2026
## Optic Release Automation This **draft** PR is opened by Github action [optic-release-automation-action](https://github.com/nearform-actions/optic-release-automation-action). A new **draft** GitHub release [v1.0.0-pre.2](https://github.com/digidem/comapeo-core-react-native/releases/tag/untagged-c499977757c9745e56b2) has been created. Release author: @gmaclennan #### If you want to go ahead with the release, please merge this PR. When you merge: - The GitHub release will be published - The npm package with tag pre will be published according to the publishing rules you have configured - No major or minor tags will be updated as configured #### If you close the PR - The new draft release will be deleted and nothing will change ## What's Changed * Android Testing Infrastructure & Bug Fixes by @gmaclennan in #3 * chore: prebuild example/android; harden instrumented tests by @gmaclennan in #10 * Integrate @comapeo/core via IPC over Unix sockets by @gmaclennan in #5 * chore: adjust repo setup by @achou11 in #12 * chore: minor fixes based on expo-doctor by @achou11 in #13 * Add iOS support & test infrastructure by @gmaclennan in #6 * chore: add architecture docs & plans by @gmaclennan in #11 * update some native deps used in backend by @achou11 in #14 * iOS Phase 1: unified JS bundle + smoke test (simulator-only) by @gmaclennan in #15 * iOS Phase 2: xcframework Embed & Sign for native addons by @gmaclennan in #16 * Phase 2 Android: jniLibs packaging + unified rollup loader plugin by @gmaclennan in #17 * chore: post-Phase-2 cleanup — comments, plan docs, agents.md by @gmaclennan in #33 * android: read abiFilters from reactNativeArchitectures (#30) by @gmaclennan in #35 * refactor: simplify build-backend.ts; rollup writes directly to native asset trees by @gmaclennan in #34 * chore: fix eslint configuration by @achou11 in #41 * android: audit 16 KB page alignment on every shipped .so by @gmaclennan in #43 * Add rootkey persistence and lifecycle state management by @gmaclennan in #36 * chore: move example app into apps directory by @achou11 in #18 * refactor: per-component lifecycle state with derived ComapeoState by @gmaclennan in #47 * android: fold waitForFile into connect retry loop by @gmaclennan in #52 * chore: add e2e testing app by @achou11 in #49 * fix(android): drop setUnlockedDeviceRequired from rootkey wrapper key by @gmaclennan in #57 * fix(backend): cache stopping/error frames for late joiners by @gmaclennan in #58 * fix(ios-tests): wait for STOPPING before signalling node exit by @gmaclennan in #59 * fix(android): drain JNI stdio pumps before returning from node::Start by @gmaclennan in #60 * Sentry integration: Phase 1 + Phase 2a + Phase 2b by @gmaclennan in #54 * feat(backend): polywasm-backed undici on iOS, re-enable maps plugin by @gmaclennan in #62 * ci: drop unreliable Android emulator snapshot caching by @gmaclennan in #64 * feat(sentry): land Phase 3 — backend loader + RPC tracing by @gmaclennan in #63 * fix(ios-tests): serialise STOPPING/STOPPED observers in testFullLifecycleStateTransitions by @gmaclennan in #71 * use npm list instead of custom traversal to get native module versions by @achou11 in #70 * feat(sentry): land Phases 6 + 7a — Android exit reasons & iOS MetricKit app-exit telemetry by @gmaclennan in #72 * fix(sentry): make exit telemetry lossless and stop cross-process clobbering by @gmaclennan in #84 * chore(e2e): add e2e tests on browserstack via Maestro by @achou11 in #56 * feat(sentry): migrate to @sentry/react-native v8; exit telemetry as Application Metrics by @gmaclennan in #73 * Map server integration by @gmaclennan in #86 * chore(deps): upgrade to Expo SDK 56 (React Native 0.85) by @gmaclennan in #87 * chore(ci): add release workflow by @gmaclennan in #90 * chore: fix npm script and release build script by @gmaclennan in #91 * chore(pack): don't try to package build files by @gmaclennan in #92 * fix: start fastify listening by @gmaclennan in #93 * perf(backend): switch bundler from rollup to rolldown by @gmaclennan in #94 * fix(ci): ignore-scripts in ios npm installs by @gmaclennan in #96 * fix(ci): replace --ignore-scripts with npm strict-allow-scripts allowlist by @gmaclennan in #106 * feat(config): let the consuming app supply the default project config by @gmaclennan in #95 * chore(release): merge prerelease branch. by @gmaclennan in #110 ## New Contributors * @achou11 made their first contribution in #12 **Full Changelog**: https://github.com/digidem/comapeo-core-react-native/commits/v1.0.0-pre.2 <!-- <release-meta>{"id":342868678,"version":"v1.0.0-pre.2","npmTag":"pre","opticUrl":"https://optic-zf3votdk5a-ew.a.run.app/api/generate/"}</release-meta> -->
gmaclennan
added a commit
that referenced
this pull request
Jun 22, 2026
## Optic Release Automation This **draft** PR is opened by Github action [optic-release-automation-action](https://github.com/nearform-actions/optic-release-automation-action). A new **draft** GitHub release [v1.0.0-pre.2](https://github.com/digidem/comapeo-core-react-native/releases/tag/untagged-352a6c41c12fd02dec37) has been created. Release author: @gmaclennan #### If you want to go ahead with the release, please merge this PR. When you merge: - The GitHub release will be published - The npm package with tag pre will be published according to the publishing rules you have configured - No major or minor tags will be updated as configured #### If you close the PR - The new draft release will be deleted and nothing will change <!-- Release notes generated using configuration in .github/release.yml at 7fe80b4 --> ## What's Changed ### 🚀 Features * Integrate @comapeo/core via IPC over Unix sockets by @gmaclennan in #5 * Add iOS support & test infrastructure by @gmaclennan in #6 * iOS Phase 1: unified JS bundle + smoke test (simulator-only) by @gmaclennan in #15 * iOS Phase 2: xcframework Embed & Sign for native addons by @gmaclennan in #16 * Phase 2 Android: jniLibs packaging + unified rollup loader plugin by @gmaclennan in #17 * android: read abiFilters from reactNativeArchitectures (#30) by @gmaclennan in #35 * Add rootkey persistence and lifecycle state management by @gmaclennan in #36 * Sentry integration: Phase 1 + Phase 2a + Phase 2b by @gmaclennan in #54 * feat(backend): polywasm-backed undici on iOS, re-enable maps plugin by @gmaclennan in #62 * feat(sentry): land Phase 3 — backend loader + RPC tracing by @gmaclennan in #63 * feat(sentry): land Phases 6 + 7a — Android exit reasons & iOS MetricKit app-exit telemetry by @gmaclennan in #72 * feat(sentry): migrate to @sentry/react-native v8; exit telemetry as Application Metrics by @gmaclennan in #73 * Map server integration by @gmaclennan in #86 * feat(config): let the consuming app supply the default project config by @gmaclennan in #95 ### 🐛 Bug Fixes * fix(android): drop setUnlockedDeviceRequired from rootkey wrapper key by @gmaclennan in #57 * fix(backend): cache stopping/error frames for late joiners by @gmaclennan in #58 * fix(ios-tests): wait for STOPPING before signalling node exit by @gmaclennan in #59 * fix(android): drain JNI stdio pumps before returning from node::Start by @gmaclennan in #60 * fix(ios-tests): serialise STOPPING/STOPPED observers in testFullLifecycleStateTransitions by @gmaclennan in #71 * fix(sentry): make exit telemetry lossless and stop cross-process clobbering by @gmaclennan in #84 * fix: start fastify listening by @gmaclennan in #93 * fix(ci): ignore-scripts in ios npm installs by @gmaclennan in #96 * fix(ci): replace --ignore-scripts with npm strict-allow-scripts allowlist by @gmaclennan in #106 * fix(release): stop `npm pack --dry-run` leaking dry-run into backend install by @gmaclennan in #129 ### ⚡ Performance * perf(backend): switch bundler from rollup to rolldown by @gmaclennan in #94 ### ⬆️ Dependencies * update some native deps used in backend by @achou11 in #14 * chore(deps): upgrade to Expo SDK 56 (React Native 0.85) by @gmaclennan in #87 ### 🏗️ Maintenance * Android Testing Infrastructure & Bug Fixes by @gmaclennan in #3 * chore: prebuild example/android; harden instrumented tests by @gmaclennan in #10 * chore: adjust repo setup by @achou11 in #12 * chore: minor fixes based on expo-doctor by @achou11 in #13 * chore: add architecture docs & plans by @gmaclennan in #11 * chore: post-Phase-2 cleanup — comments, plan docs, agents.md by @gmaclennan in #33 * refactor: simplify build-backend.ts; rollup writes directly to native asset trees by @gmaclennan in #34 * chore: fix eslint configuration by @achou11 in #41 * android: audit 16 KB page alignment on every shipped .so by @gmaclennan in #43 * chore: move example app into apps directory by @achou11 in #18 * refactor: per-component lifecycle state with derived ComapeoState by @gmaclennan in #47 * android: fold waitForFile into connect retry loop by @gmaclennan in #52 * chore: add e2e testing app by @achou11 in #49 * ci: drop unreliable Android emulator snapshot caching by @gmaclennan in #64 * use npm list instead of custom traversal to get native module versions by @achou11 in #70 * chore(e2e): add e2e tests on browserstack via Maestro by @achou11 in #56 * chore(ci): add release workflow by @gmaclennan in #90 * chore: fix npm script and release build script by @gmaclennan in #91 * chore(pack): don't try to package build files by @gmaclennan in #92 * chore(release): merge prerelease branch. by @gmaclennan in #110 * ci(e2e): retry BrowserStack builds on infra-class flakes by @gmaclennan in #113 ### Other Changes * ci: derive changelog labels from PR titles + add Dependabot by @gmaclennan in #114 ## New Contributors * @achou11 made their first contribution in #12 * @optic-release-automation[bot] made their first contribution in #112 **Full Changelog**: https://github.com/digidem/comapeo-core-react-native/commits/v1.0.0-pre.2 <!-- <release-meta>{"id":342970724,"version":"v1.0.0-pre.2","npmTag":"pre","opticUrl":"https://optic-zf3votdk5a-ew.a.run.app/api/generate/"}</release-meta> -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements persistent device identity (rootkey) management and adds comprehensive lifecycle state tracking to the embedded Node.js service across iOS and Android platforms. The rootkey is now generated once on first launch, securely persisted using platform-native keystores, and passed to the backend during service initialization. A new state machine tracks service lifecycle transitions and exposes them to the JavaScript layer.
Key Changes
Rootkey Persistence
RootKeyStore.kt): Implements 16-byte rootkey generation and persistence using AndroidKeyStore-backed AES/GCM encryption with SharedPreferences storage. Includes read-back verification and explicit failure handling to prevent silent identity loss.RootKeyStore.swift): Implements equivalent functionality using iOS Keychain withkSecClassGenericPasswordandAfterFirstUnlockThisDeviceOnlyaccessibility.Service Lifecycle State Machine
NodeJSService.kt): AddedStateenum (STOPPED, STARTING, STARTED, STOPPING, ERROR) with state transition tracking andonStateChangecallback for FGS integration.NodeJSService.swift): Mirrors Android's state machine with identical state names for cross-platform consistency.backend/index.js): Refactored initialization to be gated on receiving the rootkey from native via a newinitRPC frame. The control socket now drives a two-stage boot:started(control socket ready, awaiting rootkey), thenready(MapeoManager constructed).Rootkey Handshake Protocol
startedonce the control socket is listening.initframe containing base64-encoded rootkey.MapeoManagerwith the rootkey and broadcastsready.stateChangeevents.JavaScript Layer Integration
ComapeoCoreModule.ts: AddedgetState()method andstateChangeevent emission for lifecycle observation.ComapeoCore.types.ts: DefinedComapeoStateunion type matching platform implementations.src/index.ts: Exportedstateobserver alongside existingcomapeoRPC client.Test Infrastructure
MockBackend.swift: New helper for macOS swift-test runs that simulates the Node.js backend's control socket handshake, enabling tests to verify rootkey round-tripping without touching the real keychain.MockBackendand verify state transitions and rootkey capture.Data Protection
comapeo_backup_rules.xmlandcomapeo_data_extraction_rules.xmlto exclude the rootkey-bearing SharedPreferences from cloud backup and device-to-device transfer.AfterFirstUnlockThisDeviceOnlyaccessibility (no iCloud sync, no device-to-device restore).Notable Implementation Details
startedandreadymessages to late-connecting clients, ensuring consistent state even if the JavaScript layer initializes after the handshake completes.https://claude.ai/code/session_01WEzwFdqnPVbcdWgMbX6bjE