Skip to content

ci: drop unreliable Android emulator snapshot caching#64

Merged
gmaclennan merged 10 commits into
mainfrom
claude/ci-emulator-readiness
May 11, 2026
Merged

ci: drop unreliable Android emulator snapshot caching#64
gmaclennan merged 10 commits into
mainfrom
claude/ci-emulator-readiness

Conversation

@gmaclennan

@gmaclennan gmaclennan commented May 7, 2026

Copy link
Copy Markdown
Member

Summary

Drop AVD snapshot caching from the Android instrumented-test workflow, and the readiness loop it was put in place to support.

The original PR added a wait-for-emulator probe to handle a race where, after a snapshot restore, sys.boot_completed=1 (it gets restored from the saved property store) arrived before system_server had bound settings / package / activity, causing the next adb shell settings put to fail with Broken pipe.

Iterating with the probe instrumented showed the actual failure mode is worse: the restored snapshot's system_server is dead-on-arrival on a different runner machine — init.svc.zygote stayed in restarting for the full 120s probe window, and the emulator-runner action's own adb shell input keyevent 82 (which runs before the user script:) was already throwing android.os.DeadSystemException. The cached snapshot was healthy when saved but not portable across runners (likely Vulkan/RAM guest state coupled to the host emulator's ICD — confirmed by the "Please update the emulator to one that supports the feature(s): Vulkan" / "Increasing RAM size to 2048MB" warnings on every restore).

With snapshot caching gone the emulator cold-boots every run (~60–90s slower). On cold boot sys.boot_completed=1 is set by system_server only after the BOOT_COMPLETED broadcast, so core services are bound by the time the action's built-in wait returns — no custom probe needed.

Net change

  • Removed the Cache AVD snapshot and Create AVD and generate snapshot steps.
  • Removed the inline adb wait-for-device … getprop sys.boot_completed loop from the test step (the action already does this).

Test plan

  • Confirm the Android instrumented-test job is green on this PR.
  • Re-run a couple of times to confirm no cold-boot regressions.
  • Confirm the job stays within its 60-minute timeout.

Generated by Claude Code

claude and others added 10 commits May 7, 2026 22:57
`sys.boot_completed` flips as soon as zygote is alive, which on a
snapshot restore happens before `system_server` finishes binding the
`settings`, `package`, and `activity` services. The previous wait-for-device
loop returned almost instantly because the property was already set, and the
next `adb shell settings put` then crashed with `cmd: Failure calling service
settings: Broken pipe (32)`.

Replace it with a probe that polls each service we're about to call until
it actually responds, with a 120s ceiling. One retry on each `settings put`
covers a residual race where `list` succeeds but a write transaction loses
to first-time service initialisation.

`disable-animations: false` is unchanged — emulator-runner's own
`input keyevent 82` path crashes the emulator on this image, which is why
the workflow drives the settings directly.
reactivecircus/android-emulator-runner v2 invokes the script with
\`/usr/bin/sh\` (dash on Ubuntu), not bash. The previous version used
\`[[ ]]\`, \`local\`, and \`\$SECONDS\` which dash refuses to parse with:

  /usr/bin/sh: 1: Syntax error: end of file unexpected (expecting \"}\")

Replace with POSIX-only constructs: \`[ ]\` tests, \`\$(date +%s)\` for
elapsed-time tracking, an inline loop with a \`ready\` flag instead of a
function. Verified parses + runs under dash locally.
reactivecircus/android-emulator-runner v2 invokes the workflow's
`script:` line by line through `sh -c`, not as a single multi-line
script — so any function definition, while loop, or other multi-line
construct is split across separate shell invocations and fails to
parse (the previous attempts hit "expecting }" / "expecting done").

Move the entire body to scripts/run-android-instrumented-tests.sh and
have the workflow invoke it as a single line. The script can use
bash freely (proper shebang, set -euo pipefail).
Scope the script down to just the multi-line wait-for-emulator-services
loop (the only part that can't survive the action's per-line `sh -c`).
Animation settings and the gradle invocation move back inline as
single-line statements; the `||` retry uses a one-line brace group,
which dash parses fine.
Run each readiness check independently each tick so we capture pm
and settings exit codes separately, then log t=Xs bc='…' pm_rc=…
cmd_rc=… every ~10s and dump init.svc.* every ~30s. Drop set -e so
an intermittent adb hiccup can't silently kill the probe.

Diagnostic step — next CI run should show whether the timeout is a
genuinely-broken emulator or a probe bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous CI runs showed a clear pattern: the cached snapshot booted
but zygote immediately fell into a restart loop (init.svc.zygote =
restarting for the full 120s, pm/cmd returning DEAD_OBJECT). The
warning "Please update the emulator to one that supports the
feature(s): Vulkan" plus a forced "Increasing RAM size to 2048MB"
pointed at incompatibility between the cached snapshot and the
emulator GitHub now installs.

Two changes:
- Cache key suffix `-v2` so the bad snapshot is abandoned.
- Run wait-for-emulator.sh in the snapshot-creation step so we only
  save a snapshot from a fully-healthy userland (the action saves
  on emulator shutdown, after the script returns).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-runner snapshot restore produces a corpse: snapshot loads,
sys.boot_completed=1, but system_server is dead on first adb call
(DeadSystemException). Same snapshot restores fine within the job
that saved it, so the rot is portability — likely the captured
Vulkan/RAM guest state is coupled to the host emulator's ICD.

Cold-booting the emulator every run trades ~60-90s for reliability.
The wait-for-emulator probe still gates the test command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The probe was added to work around a snapshot-restore race where
sys.boot_completed=1 came from the saved property store before live
services were bound. With snapshot caching dropped, the action's
own wait on sys.boot_completed=1 is sufficient — on cold boot that
property is only set after system_server posts BOOT_COMPLETED, so
core services are already bound.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gmaclennan gmaclennan changed the title ci: wait for emulator services to be ready post-snapshot ci: drop unreliable Android emulator snapshot caching May 11, 2026
@gmaclennan gmaclennan merged commit 09be9ec into main May 11, 2026
7 checks passed
@gmaclennan gmaclennan deleted the claude/ci-emulator-readiness branch May 11, 2026 21:16
gmaclennan added a commit that referenced this pull request Jun 22, 2026
## Optic Release Automation

This **draft** PR is opened by Github action
[optic-release-automation-action](https://github.com/nearform-actions/optic-release-automation-action).

A new **draft** GitHub release
[v1.0.0-pre.2](https://github.com/digidem/comapeo-core-react-native/releases/tag/untagged-c499977757c9745e56b2)
has been created.

Release author: @gmaclennan

#### If you want to go ahead with the release, please merge this PR.
When you merge:

- The GitHub release will be published

- The npm package with tag pre will be published according to the
publishing rules you have configured



- No major or minor tags will be updated as configured


#### If you close the PR

- The new draft release will be deleted and nothing will change

## What's Changed
* Android Testing Infrastructure & Bug Fixes by @gmaclennan in
#3
* chore: prebuild example/android; harden instrumented tests by
@gmaclennan in
#10
* Integrate @comapeo/core via IPC over Unix sockets by @gmaclennan in
#5
* chore: adjust repo setup by @achou11 in
#12
* chore: minor fixes based on expo-doctor by @achou11 in
#13
* Add iOS support & test infrastructure by @gmaclennan in
#6
* chore: add architecture docs & plans by @gmaclennan in
#11
* update some native deps used in backend by @achou11 in
#14
* iOS Phase 1: unified JS bundle + smoke test (simulator-only) by
@gmaclennan in
#15
* iOS Phase 2: xcframework Embed & Sign for native addons by @gmaclennan
in #16
* Phase 2 Android: jniLibs packaging + unified rollup loader plugin by
@gmaclennan in
#17
* chore: post-Phase-2 cleanup — comments, plan docs, agents.md by
@gmaclennan in
#33
* android: read abiFilters from reactNativeArchitectures (#30) by
@gmaclennan in
#35
* refactor: simplify build-backend.ts; rollup writes directly to native
asset trees by @gmaclennan in
#34
* chore: fix eslint configuration by @achou11 in
#41
* android: audit 16 KB page alignment on every shipped .so by
@gmaclennan in
#43
* Add rootkey persistence and lifecycle state management by @gmaclennan
in #36
* chore: move example app into apps directory by @achou11 in
#18
* refactor: per-component lifecycle state with derived ComapeoState by
@gmaclennan in
#47
* android: fold waitForFile into connect retry loop by @gmaclennan in
#52
* chore: add e2e testing app by @achou11 in
#49
* fix(android): drop setUnlockedDeviceRequired from rootkey wrapper key
by @gmaclennan in
#57
* fix(backend): cache stopping/error frames for late joiners by
@gmaclennan in
#58
* fix(ios-tests): wait for STOPPING before signalling node exit by
@gmaclennan in
#59
* fix(android): drain JNI stdio pumps before returning from node::Start
by @gmaclennan in
#60
* Sentry integration: Phase 1 + Phase 2a + Phase 2b by @gmaclennan in
#54
* feat(backend): polywasm-backed undici on iOS, re-enable maps plugin by
@gmaclennan in
#62
* ci: drop unreliable Android emulator snapshot caching by @gmaclennan
in #64
* feat(sentry): land Phase 3 — backend loader + RPC tracing by
@gmaclennan in
#63
* fix(ios-tests): serialise STOPPING/STOPPED observers in
testFullLifecycleStateTransitions by @gmaclennan in
#71
* use npm list instead of custom traversal to get native module versions
by @achou11 in
#70
* feat(sentry): land Phases 6 + 7a — Android exit reasons & iOS
MetricKit app-exit telemetry by @gmaclennan in
#72
* fix(sentry): make exit telemetry lossless and stop cross-process
clobbering by @gmaclennan in
#84
* chore(e2e): add e2e tests on browserstack via Maestro by @achou11 in
#56
* feat(sentry): migrate to @sentry/react-native v8; exit telemetry as
Application Metrics by @gmaclennan in
#73
* Map server integration by @gmaclennan in
#86
* chore(deps): upgrade to Expo SDK 56 (React Native 0.85) by @gmaclennan
in #87
* chore(ci): add release workflow by @gmaclennan in
#90
* chore: fix npm script and release build script by @gmaclennan in
#91
* chore(pack): don't try to package build files by @gmaclennan in
#92
* fix: start fastify listening by @gmaclennan in
#93
* perf(backend): switch bundler from rollup to rolldown by @gmaclennan
in #94
* fix(ci): ignore-scripts in ios npm installs by @gmaclennan in
#96
* fix(ci): replace --ignore-scripts with npm strict-allow-scripts
allowlist by @gmaclennan in
#106
* feat(config): let the consuming app supply the default project config
by @gmaclennan in
#95
* chore(release): merge prerelease branch. by @gmaclennan in
#110

## New Contributors
* @achou11 made their first contribution in
#12

**Full Changelog**:
https://github.com/digidem/comapeo-core-react-native/commits/v1.0.0-pre.2

<!--

<release-meta>{"id":342868678,"version":"v1.0.0-pre.2","npmTag":"pre","opticUrl":"https://optic-zf3votdk5a-ew.a.run.app/api/generate/"}</release-meta>
-->
@gmaclennan gmaclennan added the maintenance Refactor / test / chore / ci / build (changelog) label Jun 22, 2026
gmaclennan added a commit that referenced this pull request Jun 22, 2026
## Optic Release Automation

This **draft** PR is opened by Github action
[optic-release-automation-action](https://github.com/nearform-actions/optic-release-automation-action).

A new **draft** GitHub release
[v1.0.0-pre.2](https://github.com/digidem/comapeo-core-react-native/releases/tag/untagged-352a6c41c12fd02dec37)
has been created.

Release author: @gmaclennan

#### If you want to go ahead with the release, please merge this PR.
When you merge:

- The GitHub release will be published

- The npm package with tag pre will be published according to the
publishing rules you have configured



- No major or minor tags will be updated as configured


#### If you close the PR

- The new draft release will be deleted and nothing will change

<!-- Release notes generated using configuration in .github/release.yml
at 7fe80b4 -->

## What's Changed
### 🚀 Features
* Integrate @comapeo/core via IPC over Unix sockets by @gmaclennan in
#5
* Add iOS support & test infrastructure by @gmaclennan in
#6
* iOS Phase 1: unified JS bundle + smoke test (simulator-only) by
@gmaclennan in
#15
* iOS Phase 2: xcframework Embed & Sign for native addons by @gmaclennan
in #16
* Phase 2 Android: jniLibs packaging + unified rollup loader plugin by
@gmaclennan in
#17
* android: read abiFilters from reactNativeArchitectures (#30) by
@gmaclennan in
#35
* Add rootkey persistence and lifecycle state management by @gmaclennan
in #36
* Sentry integration: Phase 1 + Phase 2a + Phase 2b by @gmaclennan in
#54
* feat(backend): polywasm-backed undici on iOS, re-enable maps plugin by
@gmaclennan in
#62
* feat(sentry): land Phase 3 — backend loader + RPC tracing by
@gmaclennan in
#63
* feat(sentry): land Phases 6 + 7a — Android exit reasons & iOS
MetricKit app-exit telemetry by @gmaclennan in
#72
* feat(sentry): migrate to @sentry/react-native v8; exit telemetry as
Application Metrics by @gmaclennan in
#73
* Map server integration by @gmaclennan in
#86
* feat(config): let the consuming app supply the default project config
by @gmaclennan in
#95
### 🐛 Bug Fixes
* fix(android): drop setUnlockedDeviceRequired from rootkey wrapper key
by @gmaclennan in
#57
* fix(backend): cache stopping/error frames for late joiners by
@gmaclennan in
#58
* fix(ios-tests): wait for STOPPING before signalling node exit by
@gmaclennan in
#59
* fix(android): drain JNI stdio pumps before returning from node::Start
by @gmaclennan in
#60
* fix(ios-tests): serialise STOPPING/STOPPED observers in
testFullLifecycleStateTransitions by @gmaclennan in
#71
* fix(sentry): make exit telemetry lossless and stop cross-process
clobbering by @gmaclennan in
#84
* fix: start fastify listening by @gmaclennan in
#93
* fix(ci): ignore-scripts in ios npm installs by @gmaclennan in
#96
* fix(ci): replace --ignore-scripts with npm strict-allow-scripts
allowlist by @gmaclennan in
#106
* fix(release): stop `npm pack --dry-run` leaking dry-run into backend
install by @gmaclennan in
#129
### ⚡ Performance
* perf(backend): switch bundler from rollup to rolldown by @gmaclennan
in #94
### ⬆️ Dependencies
* update some native deps used in backend by @achou11 in
#14
* chore(deps): upgrade to Expo SDK 56 (React Native 0.85) by @gmaclennan
in #87
### 🏗️ Maintenance
* Android Testing Infrastructure & Bug Fixes by @gmaclennan in
#3
* chore: prebuild example/android; harden instrumented tests by
@gmaclennan in
#10
* chore: adjust repo setup by @achou11 in
#12
* chore: minor fixes based on expo-doctor by @achou11 in
#13
* chore: add architecture docs & plans by @gmaclennan in
#11
* chore: post-Phase-2 cleanup — comments, plan docs, agents.md by
@gmaclennan in
#33
* refactor: simplify build-backend.ts; rollup writes directly to native
asset trees by @gmaclennan in
#34
* chore: fix eslint configuration by @achou11 in
#41
* android: audit 16 KB page alignment on every shipped .so by
@gmaclennan in
#43
* chore: move example app into apps directory by @achou11 in
#18
* refactor: per-component lifecycle state with derived ComapeoState by
@gmaclennan in
#47
* android: fold waitForFile into connect retry loop by @gmaclennan in
#52
* chore: add e2e testing app by @achou11 in
#49
* ci: drop unreliable Android emulator snapshot caching by @gmaclennan
in #64
* use npm list instead of custom traversal to get native module versions
by @achou11 in
#70
* chore(e2e): add e2e tests on browserstack via Maestro by @achou11 in
#56
* chore(ci): add release workflow by @gmaclennan in
#90
* chore: fix npm script and release build script by @gmaclennan in
#91
* chore(pack): don't try to package build files by @gmaclennan in
#92
* chore(release): merge prerelease branch. by @gmaclennan in
#110
* ci(e2e): retry BrowserStack builds on infra-class flakes by
@gmaclennan in
#113
### Other Changes
* ci: derive changelog labels from PR titles + add Dependabot by
@gmaclennan in
#114

## New Contributors
* @achou11 made their first contribution in
#12
* @optic-release-automation[bot] made their first contribution in
#112

**Full Changelog**:
https://github.com/digidem/comapeo-core-react-native/commits/v1.0.0-pre.2

<!--

<release-meta>{"id":342970724,"version":"v1.0.0-pre.2","npmTag":"pre","opticUrl":"https://optic-zf3votdk5a-ew.a.run.app/api/generate/"}</release-meta>
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Refactor / test / chore / ci / build (changelog)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants