chore: merge next into merge-train/spartan (resolve conflicts) by AztecBot · Pull Request #24052 · AztecProtocol/aztec-packages

AztecBot · 2026-06-12T11:31:21Z

Why

merge-train/spartan (PR #23971) has been in state dirty and has not merged into next for ~2 days. This PR merges current next into the train branch and resolves the conflicts so #23971 becomes mergeable again.

Conflicts resolved

Both conflicts were cron-schedule differences. The train branch's commit chore(ci): align nightly scheduled workflow times (#24045) deliberately set these times, and that alignment is exactly what the train is bringing into next — so the train (HEAD) side was kept in both:

.github/workflows/deploy-staging-internal.yml — kept cron: "0 6 * * *" (train) over "0 7 * * *" (next)
.github/workflows/nightly-release-tag.yml — kept cron: "0 4 * * *" (train) over "0 2 * * *" (next)

After resolution, origin/next is an ancestor of this branch; the only net file change versus the current train tip is spartan/terraform/gke-cluster/iam.tf (+24, brought in from next).

⚠️ Merge with a merge commit, not squash

This PR carries a real merge commit so that next stays an ancestor of merge-train/spartan. It must be landed with Create a merge commit (hence the ci-no-squash label) — squashing would drop the merge and leave #23971 dirty.

Created by claudebox · group: slackbot

Merge-queue runs route through `multi_job_run`, which pipes the runner-side orchestration into a parent dashboard log (`cache_log "CI run" $RUN_ID`) — so the spot/instance request is visible on ci.aztec-labs.com. Single-instance PR modes called `bootstrap_ec2` directly, so that output only reached the GitHub Actions console; you had to leave the dashboard to see which instance was created. Route the PR-facing single-instance modes (fast/docs/barretenberg/ barretenberg-full, full/full-no-test-cache, chonk-input-update) through `multi_job_run` with a single job, matching merge-queue. The job id is kept as `x-$cmd` so the `ci/<job>` GitHub status check name is unchanged. socket-fix keeps its raw (un-denoised) output but now pipes through `cache_log` so it too gets a parent log.

Fix A-1163

bootstrap_ec2 terminates any existing instance sharing the target Name tag, to reap orphans left by a cancelled GA run on the same ref. But the name was just <ref>_<arch>[_postfix], with no repo component — so aztec-packages and aztec-packages-private, which build the same tags/refs concurrently under the same OIDC role, computed identical names and reaped each other's live instances. Observed: nightly tag v5.0.0-nightly.20260610 built in both repos; the public run's pre-launch reap terminated the private run's in-progress arm64 release instance ~7 min in, failing that build. Prefix the instance name with the repo basename (GITHUB_REPOSITORY##*/, default aztec-packages). The key stays stable across re-runs within a repo, so the intended orphan cleanup still works; it only stops the two repos from colliding. ci.sh's helper instance_name (shell/kill/get-ip) is kept in sync.

#23987) ## Problem `ci3/bootstrap_ec2` terminates any existing instance that shares the target `Name` tag before launching — this intentionally reaps orphans left when a GA run is cancelled (e.g. by a new push) on the same ref. But the name was `<ref>_<arch>[_<postfix>]` with **no repo component**, so `aztec-packages` and `aztec-packages-private` — which build the same tags/refs concurrently under the **same OIDC role** — computed identical names and reaped each other's live instances. ### Observed incident Nightly tag `v5.0.0-nightly.20260610` was built in **both** repos. Instance `i-02e5d6a6c148ec726` (`v5_0_0-nightly_20260610_arm64_a-release`) was launched by the private repo's run at 03:06:01 UTC and **terminated at 03:13:12 UTC by the public repo's run** for the same tag (its pre-launch reap step), ~7 min in — failing the private build. CloudTrail confirms a `TerminateInstances` from a different `ci3-<run_id>` session, not a spot interruption. ## Fix Prefix the instance name with the repo basename (`${GITHUB_REPOSITORY##*/}`, defaulting to `aztec-packages` for local runs): - **Within a repo**, the key is unchanged in spirit (`<repo>_<ref>_<arch>`) and stays stable across re-runs/new-pushes of the same ref — so the intended orphan-on-cancel cleanup still works. - **Across repos**, public → `aztec-packages_…` and private → `aztec-packages-private_…`, so they no longer match and can't reap each other. `ci.sh`'s helper `instance_name` (used by the `shell`/`kill`/`get-ip` dev commands) is kept in sync so it still resolves instances launched by a CI run for the same repo. ### Notes - The EC2 `Name` tag limit is 256 chars; the longest prefixed name is ~61 chars. The reap match uses the full `Name` tag, so the cosmetic 63-char `docker_hostname` truncation doesn't affect correctness. - One-time transition: instances launched by the old (un-prefixed) code won't be reaped by name-match from new runs; they fall back to the shutdown timer / 1.5h reaper. Self-heals within a couple hours. - This stops the *collision*. Whether public **and** private *should* both build the same nightly tag (duplicated work) is a separate question — happy to follow up if you want one gated off.

## Problem Merge-queue runs show a top-level "parent log" on the CI dashboard that includes the **spot/instance request** (which instance type was created, spot vs on-demand). Standard PR runs don't — to see what instance a PR run got, you have to leave the dashboard and dig into the GitHub Actions console. ## Cause The runner-side orchestration output (the `Requesting spot fleet…` line from `aws_request_instance_type`) is printed on the GA runner, *before* the remote build streams to its per-instance `CI_LOG_ID` log. Where that runner-side output lands depends on the path in `ci.sh`: - **Merge-queue** goes through `multi_job_run`, which pipes everything into a parent dashboard log: `parallel … 'run …' | DUP=1 cache_log "CI run" $RUN_ID`. Each `run()` wraps `bootstrap_ec2` with `PARENT_LOG_ID=$RUN_ID`, so the instance request lands in the parent log and the build log links underneath it. - **PR modes** called `bootstrap_ec2` directly — no `cache_log`, no parent log — so the instance request only reached the GA console. ## Change Route the PR-facing single-instance modes through the same `multi_job_run` path (with a single job), so they get an identical `"CI run" $RUN_ID` parent log with the instance request visible and the build log linked beneath it: - `fast` / `docs` / `barretenberg` / `barretenberg-full` - `full` / `full-no-test-cache` - `chonk-input-update` The job id is kept as `x-$cmd`, so the `ci/<job>` GitHub commit-status name is **unchanged** (no impact on required checks). `socket-fix` (which takes extra args and is an interactive debug mode) keeps its raw, un-denoised output but now pipes through `cache_log` so it also gets a parent log. ## Behavior notes - PR-run GA console output is now denoised (condensed progress) for the converted modes, matching merge-queue; the full log lives in the dashboard parent log. - Instances for these modes now carry an `INSTANCE_POSTFIX` equal to the job id (e.g. `x-fast`), so the EC2 `Name` tag becomes `<ref>_amd64_x-fast`. Same-mode re-runs still dedupe correctly. ## Validation This is a structural reuse of the already-proven merge-queue path (`multi_job_run`), and `bash -n ci.sh` passes. It can't be exercised locally (needs the GA + AWS orchestration), but **this PR's own CI run is the test**: the `fast` job should now produce a `CI run` parent log on the dashboard showing the instance request, reachable without opening GitHub Actions.

## What Two changes scoped to the **public** repo (`AztecProtocol/aztec-packages`) nightly flow, plus a follow-up tightening of the scenario-test trigger. Private tagging is unchanged. ### 1. Network scenario tests run only on the private v5-next nightly `ci3.yml`'s `ci-network-scenario` job fired on any current nightly tag in both repos. Private produces both a `next` (v6) and a `v5-next` (v5) nightly tag, so simply gating to the private repo still ran scenarios against the v6 nightly. The nightly-triggered path is now gated to **private repo + a `v5.` nightly tag**: ```yaml ( needs.validate-nightly-tag.outputs.is_current == 'true' && github.repository == 'AztecProtocol/aztec-packages-private' && startsWith(github.ref_name, 'v5.') ) || contains(github.event.pull_request.labels.*.name, 'ci-network-scenario') ``` `v5-next` is at `5.x.x` (tag `v5.x.x-nightly.*`) and `next` is at `6.x.x` (tag `v6.x.x-nightly.*`), so `startsWith(github.ref_name, 'v5.')` selects the v5-next nightly only. The manual PR-label path (`ci-network-scenario`) is preserved for ad-hoc dev runs. ### 2. Stop tagging `next` with a nightly tag in public `nightly-release-tag.yml`'s matrix tagged `[next, v5-next]` in both repos. The branch list is now repo-dependent: private keeps `[next, v5-next]`, public tags only `v5-next` (and `v4-next` via its existing dedicated job). Net result: **public tags `v4-next` + `v5-next` only**, private is untouched. ## Why Nightly network scenario tests should run only against the private v5-next nightly, and public should not produce a `next` nightly tag.

) ## What Two changes scoped to the **public** repo (`AztecProtocol/aztec-packages`) nightly flow, plus a follow-up tightening of the scenario-test trigger. Private tagging is unchanged. ### 1. Network scenario tests run only on the private v5-next nightly `ci3.yml`'s `ci-network-scenario` job fired on any current nightly tag in both repos. Private produces both a `next` (v6) and a `v5-next` (v5) nightly tag, so simply gating to the private repo still ran scenarios against the v6 nightly. The nightly-triggered path is now gated to **private repo + a `v5.` nightly tag**: ```yaml ( needs.validate-nightly-tag.outputs.is_current == 'true' && github.repository == 'AztecProtocol/aztec-packages-private' && startsWith(github.ref_name, 'v5.') ) || contains(github.event.pull_request.labels.*.name, 'ci-network-scenario') ``` `v5-next` is at `5.x.x` (tag `v5.x.x-nightly.*`) and `next` is at `6.x.x` (tag `v6.x.x-nightly.*`), so `startsWith(github.ref_name, 'v5.')` selects the v5-next nightly only. The manual PR-label path (`ci-network-scenario`) is preserved for ad-hoc dev runs. ### 2. Stop tagging `next` with a nightly tag in public `nightly-release-tag.yml`'s matrix tagged `[next, v5-next]` in both repos. The branch list is now repo-dependent: private keeps `[next, v5-next]`, public tags only `v5-next` (and `v4-next` via its existing dedicated job). Net result: **public tags `v4-next` + `v5-next` only**, private is untouched. ## Why Nightly network scenario tests should run only against the private v5-next nightly, and public should not produce a `next` nightly tag.

…letes ## Problem A devnet deploy failed waiting on CI3 for two distinct reasons: 1. **Lookup window bug.** The script used `gh run list --workflow ci3.yml` (which returns only ~20 newest runs) and filtered by `headSha` client-side. By the time the deploy polled, the 03:04 nightly run had aged off that first page, so the match never fired and the script timed out — even though the run existed. 2. **Conclusion gated the deploy.** Even once found, `gh run watch --exit-status` would fail the deploy if the CI3 nightly itself was red (e.g. #2208). The nightly bundles many jobs, so an unrelated red job blocked release even though the release build was fine. ## Fix 1. Query `repos/<repo>/actions/workflows/ci3.yml/runs?head_sha=<sha>` via `gh api`, which filters server-side by SHA and finds the run instantly no matter how old it is. 2. Drop `--exit-status` from `gh run watch` (so the whole-run conclusion no longer gates), and instead gate specifically on the two release jobs — the `./bootstrap.sh ci-release` builds on amd64 (`ci/x-release`) and arm64 (`ci/a-release`). These are posted as **GitHub commit statuses** on the tag's commit by `ci3/bootstrap_ec2` (`post_github_status ci/<job-id>`). The script now waits for both statuses to reach a terminal state (polling, since the runner posts them asynchronously) and fails only if either is not `success`. It still fails if no CI3 run ever appears for the tag. The deploy now proceeds iff CI3 ran **and** both release-build jobs succeeded, independent of unrelated nightly failures.

…letes (#24012) ## Problem A devnet deploy failed waiting on CI3 for two distinct reasons: 1. **Lookup window bug.** The script used `gh run list --workflow ci3.yml` (which returns only ~20 newest runs) and filtered by `headSha` client-side. By the time the deploy polled, the 03:04 nightly run had aged off that first page, so the match never fired and the script timed out — even though the run existed. 2. **Conclusion gated the deploy.** Even once found, `gh run watch --exit-status` would fail the deploy if the CI3 nightly itself was red (e.g. #2208). The nightly bundles many jobs, so an unrelated red job blocked release even though the release build was fine. ## Fix 1. Query `repos/<repo>/actions/workflows/ci3.yml/runs?head_sha=<sha>` via `gh api`, which filters server-side by SHA and finds the run instantly no matter how old it is. 2. Drop `--exit-status` from `gh run watch` (so the whole-run conclusion no longer gates), and instead gate specifically on the two release jobs — the `./bootstrap.sh ci-release` builds on amd64 (`ci/x-release`) and arm64 (`ci/a-release`). These are posted as **GitHub commit statuses** on the tag's commit by `ci3/bootstrap_ec2` (`post_github_status ci/<job-id>`). The script now waits for both statuses to reach a terminal state (polling, since the runner posts them asynchronously) and fails only if either is not `success`. It still fails if no CI3 run ever appears for the tag. The deploy now proceeds iff CI3 ran **and** both release-build jobs succeeded, independent of unrelated nightly failures.

## What Make all three nightly deployments run the deploy from the tip of `next` (latest scripts + helm) while keeping the correct image for each target network. ### Deploy ref → `next` - `deploy-staging-internal.yml`, `deploy-staging-public.yml`: pass `ref: next` to `deploy-network.yml` so the `spartan/` deploy scripts and helm charts come from `next`. - `deploy-next-net.yml` already passed `ref: next` (unchanged). ### `determine-tag` job (staging) - Checkout a single commit at the tip of `next` (`ref: next`, `fetch-depth: 1`) instead of `v5-next` with full history. - Tag resolution: if an explicit `tag` input is given, use it as-is. Otherwise construct `v5.0.0-nightly.<date>` and verify it actually exists with `git ls-remote --exit-code --tags origin`, failing the deploy early if the nightly tag is missing rather than proceeding to deploy a non-existent image. ## Why `deploy-network.yml` checks out `inputs.ref` to run the deploy scripts/helm; when unset it falls back to `github.ref` (default branch on `schedule`, dispatch branch on `workflow_dispatch`), making the scripts/helm implicit and branch-dependent. Pinning to `next` keeps staging on the latest infra while `semver`/`source_tag` continue to select the v5-line image (`v5.0.0-nightly.<date>`), which is the correct image for the staging networks. The `v5.0.0-nightly.<date>` tag is created on both the public and private repos (the nightly tagger tags `v5-next` on both), so the `git ls-remote origin` check resolves against whichever repo the workflow runs in.

## Summary - Update the testnet SponsoredFPC address in the networks table and getting-started guides. - Adjust release docs guidance to reflect that SponsoredFPC is deployed on testnet and devnet, but not mainnet. ## Validation - `yarn spellcheck` from `docs/`

#24026) Adds an `<agent_and_workflow_restraint>` block to the root `CLAUDE.md` telling Claude to do work inline in the current session and not spawn parallel subagents or launch dynamic workflows unless the user explicitly asks. ## Why Operators have reported burning through their token budget from a single prompt that quietly fanned out — in one case a "summarize recent ZK advancements" query started ~30 agents, and another exhausted a 5h budget spinning up subagents. Parallel agents and dynamic workflows multiply spend (≈2x for one helper, far more for a swarm) and the user can neither see the fan-out coming nor stop it. This appears to be a current tendency of Fable. The guidance reasserts: handle search/summarize/research/multi-file edits inline, reserve subagents for explicit user requests or a single read-heavy isolation case, and never start a dynamic workflow by default. Passes the repo's `<editorial_test>`: the line would have prevented the ~30-agent fan-out on an ordinary research prompt described above. Same change is being opened against `v5-next`, and an equivalent shared rule is being added in the claudebox repo so it applies to every managed session. --- *Created by [claudebox](https://claudebox.work/v2/sessions/7d5ecfdd5f37c5cd) · group: `slackbot`*

…4033) Replicates the change in #24024 directly on a repo branch, and also applies it to the versioned docs snapshot. Updates the **Aztec & Noir Developer Office Hours** Google Meet link from `https://meet.google.com/sdd-rdsr-shu` → `https://meet.google.com/vev-waao-mab` in: - `docs/docs-developers/docs/resources/community_calls.md` (current docs — identical to #24024) - `docs/developer_versioned_docs/version-v4.3.1/docs/resources/community_calls.md` (versioned snapshot — the only versioned copy that still carried the old link) No occurrences of the old `sdd-rdsr-shu` link remain anywhere under `docs/`.

…ing (#24039) ## Problem The dashboard `grind` option always fails to SSH into the build instance: ``` Waiting for SSH at 3.144.255.68... Timeout: SSH could not login to 3.144.255.68 within 60 seconds. ``` The instance launches fine (spot/on-demand fulfilled, IP assigned) but SSH never connects, so grind cycles through every instance type and gives up. ## Root cause CI build boxes were migrated from SSH to **SSM**. In `ci3/bootstrap_ec2` the default is now `CI_USE_SSH=0` (SSM); only `shell-new` forces SSH, and `grind-test` does not. So on current `next`, grind runs over SSM like the rest of CI. But the dashboard launches grind from a long-lived checkout at `REPO_PATH` (the `/grind` handler in `rk.py` shells out to `cd $REPO_PATH && ./ci.sh grind-test ...`). That checkout had drifted to a pre-SSM commit, so grind alone still took the legacy SSH branch — launching into the retired SSH security group + `build-instance` key pair, whose port-22 / key-injection preconditions were torn down during the SSM lockdown. The stale checkout also explains the old AMI (`ami-09d27244b23be8891`) in the logs vs. current `next`'s `ami-067627aa971a1dcbb`. Nothing kept `REPO_PATH` current: the `ci3-dashboard-deploy.yml` workflow only rebuilds the `rkapp` Flask container (and is path-filtered to `ci3/dashboard/**`), so changes to the `ci3/` launcher scripts never refreshed it. ## Fix Refresh the launcher checkout to `origin/next` at grind launch time, before shelling out. This is self-healing and independent of deploys. It matches the existing design where the launcher always runs current-`next` orchestration scripts while the grind *target commit* is checked out on the remote box — so this does **not** restrict which branch/commit you can grind. If the refresh fails (e.g. transient network), the error is surfaced in the run log instead of silently grinding on a stale tree. ## Testing `python3 -m py_compile ci3/dashboard/rk.py` passes. The behavior change is host-side (requires the dashboard's `REPO_PATH` checkout) and can't be exercised in unit CI; it will take effect on the next dashboard deploy. The immediate one-time unblock is still to refresh `REPO_PATH` on `ci.aztec-labs.com` and restart `rkapp`. --- *Created by [claudebox](https://claudebox.work/v2/sessions/1c05a513cb601b21) · group: `slackbot`*

…n-resolve # Conflicts: # .github/workflows/deploy-staging-internal.yml # .github/workflows/nightly-release-tag.yml

charlielye and others added 23 commits June 9, 2026 16:50

chore: deployments

01f1bc5

Fix A-1163

chore: deployments (#23959)

7e94c2c

Fix A-1163

docs: update sponsored fpc address

8bfd77d

add reminder about funding testnet fpc

2ed6d2b

docs(CLAUDE.md): discourage unprompted subagents and dynamic workflows

a62c059

docs: update Aztec & Noir Developer Office Hours Google Meet link

8aed7c5

update PR #24033

c11b084

fix(ci): refresh grind launcher checkout to origin/next before launching

3b462ff

Merge remote-tracking branch 'origin/next' into cb/merge-train-sparta…

f6521f0

…n-resolve # Conflicts: # .github/workflows/deploy-staging-internal.yml # .github/workflows/nightly-release-tag.yml

AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure ci-no-squash claudebox Owned by claudebox. it can push to this PR. labels Jun 12, 2026

PhilWindle marked this pull request as ready for review June 12, 2026 11:55

PhilWindle enabled auto-merge June 12, 2026 14:30

PhilWindle approved these changes Jun 12, 2026

View reviewed changes

PhilWindle merged commit 0fcd113 into merge-train/spartan Jun 12, 2026
51 of 55 checks passed

PhilWindle deleted the cb/merge-train-spartan-resolve branch June 12, 2026 14:33

AztecBot mentioned this pull request Jun 12, 2026

feat: merge-train/spartan #23971

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: merge next into merge-train/spartan (resolve conflicts)#24052

chore: merge next into merge-train/spartan (resolve conflicts)#24052
PhilWindle merged 23 commits into
merge-train/spartanfrom
cb/merge-train-spartan-resolve

AztecBot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

AztecBot commented Jun 12, 2026

Why

Conflicts resolved

⚠️ Merge with a merge commit, not squash

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants