fix(ci): raise AVM check-circuit per-tx timeout to 120s#23749
Closed
AztecBot wants to merge 1 commit into
Closed
Conversation
The avm-check-circuit job runs bb-avm avm_check_circuit on every dumped e2e AVM input under a fixed 30s per-tx timeout. The e2e_multiple_blobs tx produces a ~700k-row trace whose simulation + trace generation alone takes ~23s on the 2-CPU isolation container, and the subsequent circuit check pushed the run past 30s (observed 35s, killed with code 124), failing the whole job while every other input passed in 4-6s. This is the scenario the existing in-code warning anticipated. Raise the timeout to 120s to give ample headroom for the heaviest txs. Resources are left unchanged: with up to 64 jobs in parallel on a 128-CPU host, bumping --cpus would oversubscribe the runner, and a longer timeout is resource -neutral since small txs still finish in seconds.
Collaborator
Author
|
Automatically closing this stale claudebox draft PR (no updates for 5+ days). Re-open if still needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
avm-check-circuitjob in run 26703197886 failed onnextwith exit code 124 (timeout).The job runs
bb-avm avm_check_circuiton every dumped e2e AVM input in parallel, each under a fixed 30s per-tx timeout (yarn-project/end-to-end/bootstrap.sh). Every input passed in 4–6s excepte2e_multiple_blobstx0x241c8baa…, which was killed at the 30s wall (ran 35s,code: 124), failing the whole job.Root cause
That tx produces a much larger circuit (~700,560 rows vs. tiny traces for the others). From the run log of the killed job:
Simulation + trace generation alone consumed ~23s on the 2-CPU isolation container, leaving the circuit check no room before the 30s deadline. This is exactly the situation the existing in-code
WARNINGcomment anticipated ("transactions could need more CPU and MEM than we allocate by default … they might start timing out"). The 30s value has been unchanged since the feature was introduced (#18747), so this is a heavy tx finally crossing the threshold, not a regression.Fix
Raise the per-tx timeout from
30sto120s— ample headroom over the ~35s observed for the heaviest tx while small txs still finish in seconds.Resources are deliberately left at the default. With up to 64 jobs running in parallel on a 128-CPU host, the containers already use
--cpus=2(≈128 CPUs total); raising--cpuswould oversubscribe the runner. A longer timeout is resource-neutral — it only changes the kill deadline, not how much CPU/MEM each run consumes.The outdated warning comment is updated to describe the actual behavior.
Update (2026-06-03) — still recurring, please land
The same failure hit
nextagain in run 26863710723 (commit64f5310). Confirmed identical root cause from the CI dashboard log: every input passed in ~4–6s excepte2e_multiple_blobstx0x0b21460a…, killed at the 30s wall (33s,code: 124), which fail-fast (--halt now,fail=1) propagated as exit 124 to the whole job.This PR's change is exactly the right fix and still applies cleanly, but it has been sitting in draft since 2026-05-31 — which is why the nightly keeps failing and auto-dispatching duplicate fix attempts (multiple
cb/avm-check-circuit-*branches). Recommend marking it ready for review and merging; the stale sibling branches/PRs can then be closed.Update (2026-06-05) — recurred again, still the right fix
Hit
nexta third time in run 26995416365 (commit91df1ab, merge-queue). Same fingerprint from the CI dashboard log (http://ci.aztec-labs.com/1780635325767557): every input PASSED in 3–5s exceptA separate session independently reached the identical 30s→120s fix. This PR is the canonical version — please mark it ready for review and merge, then the stale
cb/avm-check-circuit-*sibling branches/PRs can be closed. The nightly/merge-queue will keep failing and re-dispatching ClaudeBox sessions until it lands.Created by claudebox · group:
slackbot