Skip to content

fix(ci): bump avm_check_circuit per-test timeout to 120s for heavy txs#24030

Draft
AztecBot wants to merge 1 commit into
nextfrom
cb/fix-avm-cc-timeout-30s
Draft

fix(ci): bump avm_check_circuit per-test timeout to 120s for heavy txs#24030
AztecBot wants to merge 1 commit into
nextfrom
cb/fix-avm-cc-timeout-30s

Conversation

@AztecBot

Copy link
Copy Markdown
Collaborator

Problem

The avm-check-circuit job failed on next (run 27367574268) with exit code 124 — a timeout, not a circuit correctness failure.

The check runs bb-avm avm_check_circuit on every dumped e2e AVM circuit input in parallel, each isolated with TIMEOUT=30s and the default 2 CPUs. Every other input passed in 4–8s; one timed out:

FAILED: .../e2e_multiple_blobs/avm-circuit-inputs-tx-0x11c58d2c....bin (33s) (code: 124)
run_test_cmd '...:ISOLATE=1:TIMEOUT=30s:NAME=avm_cc_e2e_multiple_blobs_0x11c58d2c ...'

Root cause

From the per-test log (--cpus=2):

Generating trace...                          # 18:36:24
Checking circuit...  (mem: 3869 MiB)         # 18:36:42  (~18s for sim + trace gen)
Running check (with skippable) circuit over 700560 rows.
timeout: sending signal TERM to command 'bash'   # 18:36:53  killed mid-check at the 30s cap

This one e2e_multiple_blobs tx produces a ~700k-row circuit — far larger than the other e2e txs (which finish check-circuit in a few seconds), so its trace generation + check exceeds the uniform 30s budget and is killed mid-check. The triggering commit (#24026) is docs-only, so nothing in the circuit changed; the input is simply heavy enough to cross the boundary. TIMEOUT=30s has been unchanged since the feature landed in #18747, and the original WARNING comment in avm_check_circuit_cmds explicitly anticipated this failure mode for larger txs.

This is a recurrence: PR #23771 made the same fix on June 1 but was a draft that got closed unmerged on June 7, so next reverted to the 30s ceiling and the heavy tx times out again.

Fix

Raise the per-test timeout from 30s to 120s in avm_check_circuit_cmds (yarn-project/end-to-end/bootstrap.sh). This gives ~3–4x headroom over the heaviest observed total and room for future circuit growth.

The change is zero-cost for the fast majority of checks — they still exit in a few seconds, so neither parallelism nor stage wall-clock is affected. Only the heavy outlier is allowed to run to completion instead of being killed mid-check. (Deliberately keeping the default 2 CPUs rather than bumping CPUS, which would apply to all ~500 inputs and halve runner concurrency while barely speeding up the already-fast majority.)

No outer workflow/step timeout caps below this value, so the larger per-test budget takes effect.


Created by claudebox · group: slackbot

@AztecBot AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels Jun 11, 2026
@AztecBot

Copy link
Copy Markdown
Collaborator Author

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/c94d5d4d0d1a9bbe�c94d5d4d0d1a9bbe8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_p2p/sentinel_status_slash.parallel.test.ts "slashes the proposer with INACTIVITY when checkpoint validation records invalid" (204s) (code: 0) group:e2e-p2p-epoch-flakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant