Skip to content

fix(ci3): scope build-instance name by repo to stop cross-repo reaping#23987

Merged
ludamad merged 1 commit into
nextfrom
ci3-instance-name-repo-scope
Jun 10, 2026
Merged

fix(ci3): scope build-instance name by repo to stop cross-repo reaping#23987
ludamad merged 1 commit into
nextfrom
ci3-instance-name-repo-scope

Conversation

@charlielye

Copy link
Copy Markdown
Contributor

Problem

ci3/bootstrap_ec2 terminates any existing instance that shares the target Name tag before launching — this intentionally reaps orphans left when a GA run is cancelled (e.g. by a new push) on the same ref. But the name was <ref>_<arch>[_<postfix>] with no repo component, so aztec-packages and aztec-packages-private — which build the same tags/refs concurrently under the same OIDC role — computed identical names and reaped each other's live instances.

Observed incident

Nightly tag v5.0.0-nightly.20260610 was built in both repos. Instance i-02e5d6a6c148ec726 (v5_0_0-nightly_20260610_arm64_a-release) was launched by the private repo's run at 03:06:01 UTC and terminated at 03:13:12 UTC by the public repo's run for the same tag (its pre-launch reap step), ~7 min in — failing the private build. CloudTrail confirms a TerminateInstances from a different ci3-<run_id> session, not a spot interruption.

Fix

Prefix the instance name with the repo basename (${GITHUB_REPOSITORY##*/}, defaulting to aztec-packages for local runs):

  • Within a repo, the key is unchanged in spirit (<repo>_<ref>_<arch>) and stays stable across re-runs/new-pushes of the same ref — so the intended orphan-on-cancel cleanup still works.
  • Across repos, public → aztec-packages_… and private → aztec-packages-private_…, so they no longer match and can't reap each other.

ci.sh's helper instance_name (used by the shell/kill/get-ip dev commands) is kept in sync so it still resolves instances launched by a CI run for the same repo.

Notes

  • The EC2 Name tag limit is 256 chars; the longest prefixed name is ~61 chars. The reap match uses the full Name tag, so the cosmetic 63-char docker_hostname truncation doesn't affect correctness.
  • One-time transition: instances launched by the old (un-prefixed) code won't be reaped by name-match from new runs; they fall back to the shutdown timer / 1.5h reaper. Self-heals within a couple hours.
  • This stops the collision. Whether public and private should both build the same nightly tag (duplicated work) is a separate question — happy to follow up if you want one gated off.

bootstrap_ec2 terminates any existing instance sharing the target Name tag, to
reap orphans left by a cancelled GA run on the same ref. But the name was just
<ref>_<arch>[_postfix], with no repo component — so aztec-packages and
aztec-packages-private, which build the same tags/refs concurrently under the
same OIDC role, computed identical names and reaped each other's live instances.

Observed: nightly tag v5.0.0-nightly.20260610 built in both repos; the public
run's pre-launch reap terminated the private run's in-progress arm64 release
instance ~7 min in, failing that build.

Prefix the instance name with the repo basename (GITHUB_REPOSITORY##*/, default
aztec-packages). The key stays stable across re-runs within a repo, so the
intended orphan cleanup still works; it only stops the two repos from colliding.
ci.sh's helper instance_name (shell/kill/get-ip) is kept in sync.
@ludamad ludamad added this pull request to the merge queue Jun 10, 2026
Merged via the queue into next with commit b8b0d36 Jun 10, 2026
20 checks passed
@ludamad ludamad deleted the ci3-instance-name-repo-scope branch June 10, 2026 13:39
charlielye added a commit that referenced this pull request Jun 12, 2026
## Regression

The cross-repo instance-name fix (#23987) added to `ci.sh` and
`ci3/bootstrap_ec2`:

```sh
repo=${GITHUB_REPOSITORY##*/}
repo=${repo:-aztec-packages}
```

Under `set -u` (set by `ci3/source`), the `##*/` expansion on an
**unset** `GITHUB_REPOSITORY` aborts *before* the `:-aztec-packages`
default on the next line can apply. So any local invocation fails
immediately:

```
$ ./ci.sh bench
./ci.sh: line 58: GITHUB_REPOSITORY: unbound variable
```

It only bit locally — CI always has `GITHUB_REPOSITORY` set, so it
passed there.

## Fix

Default first, then strip:

```sh
repo=${GITHUB_REPOSITORY:-aztec-packages}
repo=${repo##*/}
```

Applied in both `ci.sh` and `ci3/bootstrap_ec2`. Verified under `set -u`
with the var unset: resolves to `aztec-packages` (and
`AztecProtocol/aztec-packages` → `aztec-packages` when set), so the
instance-name scheme is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants