ci: drop unreliable Android emulator snapshot caching#64
Merged
Conversation
`sys.boot_completed` flips as soon as zygote is alive, which on a snapshot restore happens before `system_server` finishes binding the `settings`, `package`, and `activity` services. The previous wait-for-device loop returned almost instantly because the property was already set, and the next `adb shell settings put` then crashed with `cmd: Failure calling service settings: Broken pipe (32)`. Replace it with a probe that polls each service we're about to call until it actually responds, with a 120s ceiling. One retry on each `settings put` covers a residual race where `list` succeeds but a write transaction loses to first-time service initialisation. `disable-animations: false` is unchanged — emulator-runner's own `input keyevent 82` path crashes the emulator on this image, which is why the workflow drives the settings directly.
reactivecircus/android-emulator-runner v2 invokes the script with \`/usr/bin/sh\` (dash on Ubuntu), not bash. The previous version used \`[[ ]]\`, \`local\`, and \`\$SECONDS\` which dash refuses to parse with: /usr/bin/sh: 1: Syntax error: end of file unexpected (expecting \"}\") Replace with POSIX-only constructs: \`[ ]\` tests, \`\$(date +%s)\` for elapsed-time tracking, an inline loop with a \`ready\` flag instead of a function. Verified parses + runs under dash locally.
reactivecircus/android-emulator-runner v2 invokes the workflow's `script:` line by line through `sh -c`, not as a single multi-line script — so any function definition, while loop, or other multi-line construct is split across separate shell invocations and fails to parse (the previous attempts hit "expecting }" / "expecting done"). Move the entire body to scripts/run-android-instrumented-tests.sh and have the workflow invoke it as a single line. The script can use bash freely (proper shebang, set -euo pipefail).
Scope the script down to just the multi-line wait-for-emulator-services loop (the only part that can't survive the action's per-line `sh -c`). Animation settings and the gradle invocation move back inline as single-line statements; the `||` retry uses a one-line brace group, which dash parses fine.
Run each readiness check independently each tick so we capture pm and settings exit codes separately, then log t=Xs bc='…' pm_rc=… cmd_rc=… every ~10s and dump init.svc.* every ~30s. Drop set -e so an intermittent adb hiccup can't silently kill the probe. Diagnostic step — next CI run should show whether the timeout is a genuinely-broken emulator or a probe bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous CI runs showed a clear pattern: the cached snapshot booted but zygote immediately fell into a restart loop (init.svc.zygote = restarting for the full 120s, pm/cmd returning DEAD_OBJECT). The warning "Please update the emulator to one that supports the feature(s): Vulkan" plus a forced "Increasing RAM size to 2048MB" pointed at incompatibility between the cached snapshot and the emulator GitHub now installs. Two changes: - Cache key suffix `-v2` so the bad snapshot is abandoned. - Run wait-for-emulator.sh in the snapshot-creation step so we only save a snapshot from a fully-healthy userland (the action saves on emulator shutdown, after the script returns). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-runner snapshot restore produces a corpse: snapshot loads, sys.boot_completed=1, but system_server is dead on first adb call (DeadSystemException). Same snapshot restores fine within the job that saved it, so the rot is portability — likely the captured Vulkan/RAM guest state is coupled to the host emulator's ICD. Cold-booting the emulator every run trades ~60-90s for reliability. The wait-for-emulator probe still gates the test command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The probe was added to work around a snapshot-restore race where sys.boot_completed=1 came from the saved property store before live services were bound. With snapshot caching dropped, the action's own wait on sys.boot_completed=1 is sufficient — on cold boot that property is only set after system_server posts BOOT_COMPLETED, so core services are already bound. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gmaclennan
added a commit
that referenced
this pull request
Jun 22, 2026
## Optic Release Automation This **draft** PR is opened by Github action [optic-release-automation-action](https://github.com/nearform-actions/optic-release-automation-action). A new **draft** GitHub release [v1.0.0-pre.2](https://github.com/digidem/comapeo-core-react-native/releases/tag/untagged-c499977757c9745e56b2) has been created. Release author: @gmaclennan #### If you want to go ahead with the release, please merge this PR. When you merge: - The GitHub release will be published - The npm package with tag pre will be published according to the publishing rules you have configured - No major or minor tags will be updated as configured #### If you close the PR - The new draft release will be deleted and nothing will change ## What's Changed * Android Testing Infrastructure & Bug Fixes by @gmaclennan in #3 * chore: prebuild example/android; harden instrumented tests by @gmaclennan in #10 * Integrate @comapeo/core via IPC over Unix sockets by @gmaclennan in #5 * chore: adjust repo setup by @achou11 in #12 * chore: minor fixes based on expo-doctor by @achou11 in #13 * Add iOS support & test infrastructure by @gmaclennan in #6 * chore: add architecture docs & plans by @gmaclennan in #11 * update some native deps used in backend by @achou11 in #14 * iOS Phase 1: unified JS bundle + smoke test (simulator-only) by @gmaclennan in #15 * iOS Phase 2: xcframework Embed & Sign for native addons by @gmaclennan in #16 * Phase 2 Android: jniLibs packaging + unified rollup loader plugin by @gmaclennan in #17 * chore: post-Phase-2 cleanup — comments, plan docs, agents.md by @gmaclennan in #33 * android: read abiFilters from reactNativeArchitectures (#30) by @gmaclennan in #35 * refactor: simplify build-backend.ts; rollup writes directly to native asset trees by @gmaclennan in #34 * chore: fix eslint configuration by @achou11 in #41 * android: audit 16 KB page alignment on every shipped .so by @gmaclennan in #43 * Add rootkey persistence and lifecycle state management by @gmaclennan in #36 * chore: move example app into apps directory by @achou11 in #18 * refactor: per-component lifecycle state with derived ComapeoState by @gmaclennan in #47 * android: fold waitForFile into connect retry loop by @gmaclennan in #52 * chore: add e2e testing app by @achou11 in #49 * fix(android): drop setUnlockedDeviceRequired from rootkey wrapper key by @gmaclennan in #57 * fix(backend): cache stopping/error frames for late joiners by @gmaclennan in #58 * fix(ios-tests): wait for STOPPING before signalling node exit by @gmaclennan in #59 * fix(android): drain JNI stdio pumps before returning from node::Start by @gmaclennan in #60 * Sentry integration: Phase 1 + Phase 2a + Phase 2b by @gmaclennan in #54 * feat(backend): polywasm-backed undici on iOS, re-enable maps plugin by @gmaclennan in #62 * ci: drop unreliable Android emulator snapshot caching by @gmaclennan in #64 * feat(sentry): land Phase 3 — backend loader + RPC tracing by @gmaclennan in #63 * fix(ios-tests): serialise STOPPING/STOPPED observers in testFullLifecycleStateTransitions by @gmaclennan in #71 * use npm list instead of custom traversal to get native module versions by @achou11 in #70 * feat(sentry): land Phases 6 + 7a — Android exit reasons & iOS MetricKit app-exit telemetry by @gmaclennan in #72 * fix(sentry): make exit telemetry lossless and stop cross-process clobbering by @gmaclennan in #84 * chore(e2e): add e2e tests on browserstack via Maestro by @achou11 in #56 * feat(sentry): migrate to @sentry/react-native v8; exit telemetry as Application Metrics by @gmaclennan in #73 * Map server integration by @gmaclennan in #86 * chore(deps): upgrade to Expo SDK 56 (React Native 0.85) by @gmaclennan in #87 * chore(ci): add release workflow by @gmaclennan in #90 * chore: fix npm script and release build script by @gmaclennan in #91 * chore(pack): don't try to package build files by @gmaclennan in #92 * fix: start fastify listening by @gmaclennan in #93 * perf(backend): switch bundler from rollup to rolldown by @gmaclennan in #94 * fix(ci): ignore-scripts in ios npm installs by @gmaclennan in #96 * fix(ci): replace --ignore-scripts with npm strict-allow-scripts allowlist by @gmaclennan in #106 * feat(config): let the consuming app supply the default project config by @gmaclennan in #95 * chore(release): merge prerelease branch. by @gmaclennan in #110 ## New Contributors * @achou11 made their first contribution in #12 **Full Changelog**: https://github.com/digidem/comapeo-core-react-native/commits/v1.0.0-pre.2 <!-- <release-meta>{"id":342868678,"version":"v1.0.0-pre.2","npmTag":"pre","opticUrl":"https://optic-zf3votdk5a-ew.a.run.app/api/generate/"}</release-meta> -->
gmaclennan
added a commit
that referenced
this pull request
Jun 22, 2026
## Optic Release Automation This **draft** PR is opened by Github action [optic-release-automation-action](https://github.com/nearform-actions/optic-release-automation-action). A new **draft** GitHub release [v1.0.0-pre.2](https://github.com/digidem/comapeo-core-react-native/releases/tag/untagged-352a6c41c12fd02dec37) has been created. Release author: @gmaclennan #### If you want to go ahead with the release, please merge this PR. When you merge: - The GitHub release will be published - The npm package with tag pre will be published according to the publishing rules you have configured - No major or minor tags will be updated as configured #### If you close the PR - The new draft release will be deleted and nothing will change <!-- Release notes generated using configuration in .github/release.yml at 7fe80b4 --> ## What's Changed ### 🚀 Features * Integrate @comapeo/core via IPC over Unix sockets by @gmaclennan in #5 * Add iOS support & test infrastructure by @gmaclennan in #6 * iOS Phase 1: unified JS bundle + smoke test (simulator-only) by @gmaclennan in #15 * iOS Phase 2: xcframework Embed & Sign for native addons by @gmaclennan in #16 * Phase 2 Android: jniLibs packaging + unified rollup loader plugin by @gmaclennan in #17 * android: read abiFilters from reactNativeArchitectures (#30) by @gmaclennan in #35 * Add rootkey persistence and lifecycle state management by @gmaclennan in #36 * Sentry integration: Phase 1 + Phase 2a + Phase 2b by @gmaclennan in #54 * feat(backend): polywasm-backed undici on iOS, re-enable maps plugin by @gmaclennan in #62 * feat(sentry): land Phase 3 — backend loader + RPC tracing by @gmaclennan in #63 * feat(sentry): land Phases 6 + 7a — Android exit reasons & iOS MetricKit app-exit telemetry by @gmaclennan in #72 * feat(sentry): migrate to @sentry/react-native v8; exit telemetry as Application Metrics by @gmaclennan in #73 * Map server integration by @gmaclennan in #86 * feat(config): let the consuming app supply the default project config by @gmaclennan in #95 ### 🐛 Bug Fixes * fix(android): drop setUnlockedDeviceRequired from rootkey wrapper key by @gmaclennan in #57 * fix(backend): cache stopping/error frames for late joiners by @gmaclennan in #58 * fix(ios-tests): wait for STOPPING before signalling node exit by @gmaclennan in #59 * fix(android): drain JNI stdio pumps before returning from node::Start by @gmaclennan in #60 * fix(ios-tests): serialise STOPPING/STOPPED observers in testFullLifecycleStateTransitions by @gmaclennan in #71 * fix(sentry): make exit telemetry lossless and stop cross-process clobbering by @gmaclennan in #84 * fix: start fastify listening by @gmaclennan in #93 * fix(ci): ignore-scripts in ios npm installs by @gmaclennan in #96 * fix(ci): replace --ignore-scripts with npm strict-allow-scripts allowlist by @gmaclennan in #106 * fix(release): stop `npm pack --dry-run` leaking dry-run into backend install by @gmaclennan in #129 ### ⚡ Performance * perf(backend): switch bundler from rollup to rolldown by @gmaclennan in #94 ### ⬆️ Dependencies * update some native deps used in backend by @achou11 in #14 * chore(deps): upgrade to Expo SDK 56 (React Native 0.85) by @gmaclennan in #87 ### 🏗️ Maintenance * Android Testing Infrastructure & Bug Fixes by @gmaclennan in #3 * chore: prebuild example/android; harden instrumented tests by @gmaclennan in #10 * chore: adjust repo setup by @achou11 in #12 * chore: minor fixes based on expo-doctor by @achou11 in #13 * chore: add architecture docs & plans by @gmaclennan in #11 * chore: post-Phase-2 cleanup — comments, plan docs, agents.md by @gmaclennan in #33 * refactor: simplify build-backend.ts; rollup writes directly to native asset trees by @gmaclennan in #34 * chore: fix eslint configuration by @achou11 in #41 * android: audit 16 KB page alignment on every shipped .so by @gmaclennan in #43 * chore: move example app into apps directory by @achou11 in #18 * refactor: per-component lifecycle state with derived ComapeoState by @gmaclennan in #47 * android: fold waitForFile into connect retry loop by @gmaclennan in #52 * chore: add e2e testing app by @achou11 in #49 * ci: drop unreliable Android emulator snapshot caching by @gmaclennan in #64 * use npm list instead of custom traversal to get native module versions by @achou11 in #70 * chore(e2e): add e2e tests on browserstack via Maestro by @achou11 in #56 * chore(ci): add release workflow by @gmaclennan in #90 * chore: fix npm script and release build script by @gmaclennan in #91 * chore(pack): don't try to package build files by @gmaclennan in #92 * chore(release): merge prerelease branch. by @gmaclennan in #110 * ci(e2e): retry BrowserStack builds on infra-class flakes by @gmaclennan in #113 ### Other Changes * ci: derive changelog labels from PR titles + add Dependabot by @gmaclennan in #114 ## New Contributors * @achou11 made their first contribution in #12 * @optic-release-automation[bot] made their first contribution in #112 **Full Changelog**: https://github.com/digidem/comapeo-core-react-native/commits/v1.0.0-pre.2 <!-- <release-meta>{"id":342970724,"version":"v1.0.0-pre.2","npmTag":"pre","opticUrl":"https://optic-zf3votdk5a-ew.a.run.app/api/generate/"}</release-meta> -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Drop AVD snapshot caching from the Android instrumented-test workflow, and the readiness loop it was put in place to support.
The original PR added a
wait-for-emulatorprobe to handle a race where, after a snapshot restore,sys.boot_completed=1(it gets restored from the saved property store) arrived beforesystem_serverhad boundsettings/package/activity, causing the nextadb shell settings putto fail withBroken pipe.Iterating with the probe instrumented showed the actual failure mode is worse: the restored snapshot's
system_serveris dead-on-arrival on a different runner machine —init.svc.zygotestayed inrestartingfor the full 120s probe window, and the emulator-runner action's ownadb shell input keyevent 82(which runs before the userscript:) was already throwingandroid.os.DeadSystemException. The cached snapshot was healthy when saved but not portable across runners (likely Vulkan/RAM guest state coupled to the host emulator's ICD — confirmed by the "Please update the emulator to one that supports the feature(s): Vulkan" / "Increasing RAM size to 2048MB" warnings on every restore).With snapshot caching gone the emulator cold-boots every run (~60–90s slower). On cold boot
sys.boot_completed=1is set bysystem_serveronly after the BOOT_COMPLETED broadcast, so core services are bound by the time the action's built-in wait returns — no custom probe needed.Net change
Cache AVD snapshotandCreate AVD and generate snapshotsteps.adb wait-for-device … getprop sys.boot_completedloop from the test step (the action already does this).Test plan
Generated by Claude Code