Skip to content

fix(ci): retry aztec-nr nargo dependency clone on transient network flake#23653

Merged
spalladino merged 3 commits into
merge-train/spartanfrom
cb/spartan-aztec-nr-dep-clone-retry
May 29, 2026
Merged

fix(ci): retry aztec-nr nargo dependency clone on transient network flake#23653
spalladino merged 3 commits into
merge-train/spartanfrom
cb/spartan-aztec-nr-dep-clone-retry

Conversation

@AztecBot

@AztecBot AztecBot commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Why

The merge-train/spartan train PR (#23580) was dequeued from the merge queue. The merge-queue CI3 run (run 26608568295) failed in the aztec-nr warnings check, ~11s in:

Checking aztec-nr for warnings...
fatal: unable to access 'https://github.com/noir-lang/sha256/': Could not resolve host: github.com
make: *** [Makefile:303: aztec-nr] Error 1

A transient DNS flake — nargo check clones aztec-nr's pinned external git deps (noir-lang/sha256, poseidon) from github.com on a cold cache, and the runner momentarily couldn't resolve the host.

What (minimal change, 2 files)

A blanket retry would also re-run on genuine check failures (type errors, denied warnings). To avoid that:

  • ci3/retry: add an optional -p <regex> flag. When set, only failures whose output matches the regex are retried; any other failure exits immediately with the original code. Without -p, behavior is unchanged.
  • aztec-nr/bootstrap.sh: wrap the two network-touching nargo calls with retry -p "<git transport errors>" (Could not resolve host, unable to access, Connection timed out/refused, Failed to connect, TLS connect error, early EOF, RPC failed). None overlap with nargo's error:/warning: output, so a real check failure fails on the first attempt.

(nargo has no standalone dep-install command — resolution only happens inside check/compile/test — so retrying just the resolve step isn't possible; the regex-gated retry is the workable option.)

Verification

bash -n on both files. Behavioral check of the new -p path against stubbed commands: a matching (network) failure retries then succeeds; a non-matching genuine error fails fast in a single attempt; a persistent matching failure stops after 3 attempts; RETRY_DISABLED runs once.

The full ./bootstrap.sh ci is the same orchestrated remote-EC2 CI that failed here and isn't reproducible on a dev host; the DNS flake also can't be reproduced where DNS works.

@AztecBot AztecBot added the claudebox Owned by claudebox. it can push to this PR. label May 28, 2026
@spalladino spalladino marked this pull request as ready for review May 29, 2026 01:46
@spalladino spalladino enabled auto-merge (squash) May 29, 2026 01:46
@spalladino spalladino merged commit ffa2c6c into merge-train/spartan May 29, 2026
17 checks passed
@spalladino spalladino deleted the cb/spartan-aztec-nr-dep-clone-retry branch May 29, 2026 14:34
danielntmd pushed a commit to danielntmd/aztec-packages that referenced this pull request Jun 4, 2026
BEGIN_COMMIT_OVERRIDE
test(e2e): unskip pipelining related e2e tests (AztecProtocol#23642)
fix(archiver): prune blocks without proposed checkpoint by end of build
slot (AztecProtocol#23606)
test: migrate benchmarks to pipelining setup (AztecProtocol#23647)
fix(p2p): fall back to archiver in BLOCK_TXS response validation
(AztecProtocol#23624)
docs(slashing): align operator and slasher docs with AZIP-7 (AztecProtocol#23494)
fix(p2p): do not penalize peers that signal a missing block with Fr.ZERO
(AztecProtocol#23672)
chore: adjust metrics deployment (AztecProtocol#23676)
fix(cheat-codes): warpL2TimeAtLeastBy advances relative to leading clock
(AztecProtocol#23675)
chore: tighten node pool sizes (AztecProtocol#23678)
chore: remove archival nodes (AztecProtocol#23630)
chore: merge blob sink duties into RPC node (AztecProtocol#23631)
fix: sync avm-transpiler Cargo.lock with noir submodule (AztecProtocol#23683)
fix(spartan): set validator lag env vars in tps-scenario (AztecProtocol#23684)
fix: make world-state hash queries reorg-aware to close getWorldState
race (AztecProtocol#23677)
fix: pin noir submodule to next's version on merge-train/spartan
(AztecProtocol#23690)
fix: ensure image ref is used by bench runner (AztecProtocol#23682)
fix(ci): retry aztec-nr nargo dependency clone on transient network
flake (AztecProtocol#23653)
chore: run one-off jobs on network nodes (AztecProtocol#23701)
fix: simulate proposals inside target slot (AztecProtocol#23692)
chore: smaller eth-devnet (AztecProtocol#23704)
chore: enable testnet autoscaling (AztecProtocol#23705)
feat(api)!: redesign node log retrieval API around tag-based queries
(AztecProtocol#23625)
fix(sequencer): set own proposed checkpoint locally instead of via p2p
loopback (AztecProtocol#23659)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants