fix(spartan): wait_for_ci3 finds aged runs and proceeds once CI3 completes#24012
Merged
Conversation
alexghr
approved these changes
Jun 11, 2026
…letes ## Problem A devnet deploy failed waiting on CI3 for two distinct reasons: 1. **Lookup window bug.** The script used `gh run list --workflow ci3.yml` (which returns only ~20 newest runs) and filtered by `headSha` client-side. By the time the deploy polled, the 03:04 nightly run had aged off that first page, so the match never fired and the script timed out — even though the run existed. 2. **Conclusion gated the deploy.** Even once found, `gh run watch --exit-status` would fail the deploy if the CI3 nightly itself was red (e.g. #2208). The nightly bundles many jobs, so an unrelated red job blocked release even though the release build was fine. ## Fix 1. Query `repos/<repo>/actions/workflows/ci3.yml/runs?head_sha=<sha>` via `gh api`, which filters server-side by SHA and finds the run instantly no matter how old it is. 2. Drop `--exit-status` from `gh run watch` (so the whole-run conclusion no longer gates), and instead gate specifically on the two release jobs — the `./bootstrap.sh ci-release` builds on amd64 (`ci/x-release`) and arm64 (`ci/a-release`). These are posted as **GitHub commit statuses** on the tag's commit by `ci3/bootstrap_ec2` (`post_github_status ci/<job-id>`). The script now waits for both statuses to reach a terminal state (polling, since the runner posts them asynchronously) and fails only if either is not `success`. It still fails if no CI3 run ever appears for the tag. The deploy now proceeds iff CI3 ran **and** both release-build jobs succeeded, independent of unrelated nightly failures.
1629172 to
3d6efdf
Compare
Collaborator
Author
Flakey Tests🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A devnet deploy failed waiting on CI3 for two distinct reasons:
Lookup window bug. The script used
gh run list --workflow ci3.yml(which returns only ~20 newest runs) and filtered byheadShaclient-side. By the time the deploy polled, the 03:04 nightly run had aged off that first page, so the match never fired and the script timed out — even though the run existed.Conclusion gated the deploy. Even once found,
gh run watch --exit-statuswould fail the deploy if the CI3 nightly itself was red (e.g. fix: try workaround sample dapp ci timeout #2208). The nightly bundles many jobs, so an unrelated red job blocked release even though the release build was fine.Fix
Query
repos/<repo>/actions/workflows/ci3.yml/runs?head_sha=<sha>viagh api, which filters server-side by SHA and finds the run instantly no matter how old it is.Drop
--exit-statusfromgh run watch(so the whole-run conclusion no longer gates), and instead gate specifically on the two release jobs — the./bootstrap.sh ci-releasebuilds on amd64 (ci/x-release) and arm64 (ci/a-release). These are posted as GitHub commit statuses on the tag's commit byci3/bootstrap_ec2(post_github_status ci/<job-id>). The script now waits for both statuses to reach a terminal state (polling, since the runner posts them asynchronously) and fails only if either is notsuccess. It still fails if no CI3 run ever appears for the tag.The deploy now proceeds iff CI3 ran and both release-build jobs succeeded, independent of unrelated nightly failures.