Skip to content

fix(spartan): wait_for_ci3 finds aged runs and proceeds once CI3 completes#24012

Merged
AztecBot merged 1 commit into
nextfrom
cb/wait-for-ci3-fix
Jun 11, 2026
Merged

fix(spartan): wait_for_ci3 finds aged runs and proceeds once CI3 completes#24012
AztecBot merged 1 commit into
nextfrom
cb/wait-for-ci3-fix

Conversation

@AztecBot

@AztecBot AztecBot commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Problem

A devnet deploy failed waiting on CI3 for two distinct reasons:

  1. Lookup window bug. The script used gh run list --workflow ci3.yml (which returns only ~20 newest runs) and filtered by headSha client-side. By the time the deploy polled, the 03:04 nightly run had aged off that first page, so the match never fired and the script timed out — even though the run existed.

  2. Conclusion gated the deploy. Even once found, gh run watch --exit-status would fail the deploy if the CI3 nightly itself was red (e.g. fix: try workaround sample dapp ci timeout #2208). The nightly bundles many jobs, so an unrelated red job blocked release even though the release build was fine.

Fix

  1. Query repos/<repo>/actions/workflows/ci3.yml/runs?head_sha=<sha> via gh api, which filters server-side by SHA and finds the run instantly no matter how old it is.

  2. Drop --exit-status from gh run watch (so the whole-run conclusion no longer gates), and instead gate specifically on the two release jobs — the ./bootstrap.sh ci-release builds on amd64 (ci/x-release) and arm64 (ci/a-release). These are posted as GitHub commit statuses on the tag's commit by ci3/bootstrap_ec2 (post_github_status ci/<job-id>). The script now waits for both statuses to reach a terminal state (polling, since the runner posts them asynchronously) and fails only if either is not success. It still fails if no CI3 run ever appears for the tag.

The deploy now proceeds iff CI3 ran and both release-build jobs succeeded, independent of unrelated nightly failures.

@AztecBot AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels Jun 11, 2026
AztecBot added a commit that referenced this pull request Jun 11, 2026
@PhilWindle PhilWindle marked this pull request as ready for review June 11, 2026 10:35
…letes

## Problem

A devnet deploy failed waiting on CI3 for two distinct reasons:

1. **Lookup window bug.** The script used `gh run list --workflow ci3.yml` (which returns only ~20 newest runs) and filtered by `headSha` client-side. By the time the deploy polled, the 03:04 nightly run had aged off that first page, so the match never fired and the script timed out — even though the run existed.

2. **Conclusion gated the deploy.** Even once found, `gh run watch --exit-status` would fail the deploy if the CI3 nightly itself was red (e.g. #2208). The nightly bundles many jobs, so an unrelated red job blocked release even though the release build was fine.

## Fix

1. Query `repos/<repo>/actions/workflows/ci3.yml/runs?head_sha=<sha>` via `gh api`, which filters server-side by SHA and finds the run instantly no matter how old it is.

2. Drop `--exit-status` from `gh run watch` (so the whole-run conclusion no longer gates), and instead gate specifically on the two release jobs — the `./bootstrap.sh ci-release` builds on amd64 (`ci/x-release`) and arm64 (`ci/a-release`). These are posted as **GitHub commit statuses** on the tag's commit by `ci3/bootstrap_ec2` (`post_github_status ci/<job-id>`). The script now waits for both statuses to reach a terminal state (polling, since the runner posts them asynchronously) and fails only if either is not `success`. It still fails if no CI3 run ever appears for the tag.

The deploy now proceeds iff CI3 ran **and** both release-build jobs succeeded, independent of unrelated nightly failures.
@AztecBot AztecBot force-pushed the cb/wait-for-ci3-fix branch from 1629172 to 3d6efdf Compare June 11, 2026 10:38
@AztecBot AztecBot enabled auto-merge June 11, 2026 10:38
@AztecBot AztecBot added this pull request to the merge queue Jun 11, 2026
@AztecBot

Copy link
Copy Markdown
Collaborator Author

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/3fc8c480eb91fff0�3fc8c480eb91fff08;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_invalidate_block.parallel.test.ts "proposer invalidates multiple checkpoints" (458s) (code: 0) group:e2e-p2p-epoch-flakes

Merged via the queue into next with commit 8b3fdb4 Jun 11, 2026
20 checks passed
@AztecBot AztecBot deleted the cb/wait-for-ci3-fix branch June 11, 2026 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure ci-skip claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants