Skip to content

fix(ci): raise AVM check-circuit per-tx timeout to 120s#23662

Draft
AztecBot wants to merge 1 commit into
nextfrom
cb/avm-check-circuit-timeout
Draft

fix(ci): raise AVM check-circuit per-tx timeout to 120s#23662
AztecBot wants to merge 1 commit into
nextfrom
cb/avm-check-circuit-timeout

Conversation

@AztecBot

@AztecBot AztecBot commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Raise the per-input avm_check_circuit timeout from 30s to a 120s default (overridable via AVM_CHECK_CIRCUIT_TIMEOUT), so heavy AVM inputs complete with comfortable margin.

⚠️ Action needed (blocker): this PR is a draft, which is the only thing preventing it from landing. ClaudeBox cannot flip draft→ready with its current tools. Please mark it Ready for review and it can merge to next (no real CI failures — the only red checks are cosmetic Netlify redirect/header/pages neutral results). Until it merges, next keeps the 30s limit and this failure keeps recurring.

The branch was just rebased onto current next (it had fallen ~139 commits behind), so it applies cleanly.

Recurrences (same root cause, all exit code 124)

  • 2026-06-05CI run 27024687486, head 185f1de (next, merge-queue). Same input e2e_multiple_blobs tx 0x2f417fe1…, 700560 rows; trace generation took ~23s on 2 CPUs (16:14:17→16:14:40), check killed ~6s in at the 30s cap (16:14:46 timeout: sending signal TERM). Every other input passed in 3–6s. Triggering commit was an unrelated docs PR (docs: update developer and operator docs for v4.3.1 #23872).
  • 2026-06-04CI run 26973131717, head 18a23b8 (merge-train/barretenberg, feat: merge-train/barretenberg #23861). Same input e2e_multiple_blobs tx 0x01f2d613…; check killed at the 30s cap.
  • 2026-06-04CI run 26928296651, head b82384a (merge-train/spartan). Same input e2e_multiple_blobs tx 0x20581f1c…, 700560 rows; trace gen ~25s on 2 CPUs, check killed ~3s in at the 30s cap.
  • 2026-06-03CI run 26914272275, head 8500d6a (merge-train/fairies). Same input, ~700,560 rows.
  • 2026-06-02CI run 26795871178, head 1f6248d.
  • 2026-05-29 — first opened for this failure.

It keeps recurring because this PR has the fix but remains a draft and has never mergednext still runs the 30s limit.

Root cause

yarn-project/end-to-end/bootstrap.sh:avm_check_circuit runs bb-avm avm_check_circuit on every dumped e2e AVM input in parallel, each wrapped in a per-test timeout (exec_testtimeout -v $TIMEOUT). The runner uses --halt now,fail=1, so a single timeout fails the entire job. This is not a circuit-correctness failure, and is independent of the commit that happens to trigger each nightly.

One input — the e2e_multiple_blobs tx — produces a ~700,560-row AVM trace. On the default 2 CPUs the per-input log shows trace generation alone takes ~23–25s, so the circuit check is killed only a few seconds in at the 30s cap:

Generating trace...
Checking circuit... (~3782 MiB)                       (trace gen ~23s)
Running check (with skippable) circuit over 700560 rows.
timeout: sending signal TERM to command 'bash'        (killed mid-check at 30s)

The check was progressing, not hung — it simply needs more than 30s on 2 CPUs. Every other input passes in 3–6s.

Fix

  • Bump the per-check timeout to a 120s default, overridable via AVM_CHECK_CIRCUIT_TIMEOUT.
  • CPU allocation stays at the default 2: the non-strict parallelize path launches a fixed num_cpus/2 concurrent jobs (sized for 2 CPUs/job), so raising per-job CPUS without lowering that count would oversubscribe the box rather than reliably speed up the heavy job. Only the wall-clock budget was the constraint.

Testing

  • bash -O extglob -n yarn-project/end-to-end/bootstrap.sh — passes.

@AztecBot AztecBot added the claudebox Owned by claudebox. it can push to this PR. label May 29, 2026
@AztecBot AztecBot changed the title fix(ci): raise AVM check-circuit input timeout fix(ci): raise AVM check-circuit per-tx timeout to 120s Jun 1, 2026
@AztecBot AztecBot force-pushed the cb/avm-check-circuit-timeout branch from b6a8894 to e0f791e Compare June 1, 2026 13:29
AztecBot added a commit that referenced this pull request Jun 1, 2026
The avm-check-circuit job runs bb-avm avm_check_circuit on every dumped
e2e AVM input in parallel, each wrapped in a 30s timeout (exec_test's
timeout -v $TIMEOUT). The runner uses --halt now,fail=1, so a single
timeout fails the whole job.

The e2e_multiple_blobs tx produces a ~700k-row AVM trace. On the default
2 CPUs, trace generation (~22s) plus the row check exceeded 30s and the
check was killed with exit 124 (CI run 26755632012); every other input
passed in 3-6s.

Raise the per-check timeout to a 120s default and make it overridable via
AVM_CHECK_CIRCUIT_TIMEOUT, so the heaviest inputs complete with margin
while the common case still finishes quickly. CPU allocation stays at the
default 2 (the runner core count is tuned so the parallel job count
saturates it at 2 CPUs each); only wall-clock budget was the constraint.

Supersedes the stale draft branch for #23662 (rebased onto current next).
@AztecBot AztecBot added the ci-draft Run CI on draft PRs. label Jun 3, 2026
The avm-check-circuit job runs bb-avm avm_check_circuit on every dumped
e2e AVM input in parallel, each wrapped in a 30s timeout (exec_test's
timeout -v $TIMEOUT). The runner uses --halt now,fail=1, so a single
timeout fails the whole job.

The e2e_multiple_blobs tx produces a ~700k-row AVM trace. On the default
2 CPUs, trace generation (~23s) plus the row check exceeded 30s and the
check was killed with exit 124; every other input passed in 3-6s.
Recurred again in CI run 27024687486 on next.

Raise the per-check timeout to a 120s default and make it overridable via
AVM_CHECK_CIRCUIT_TIMEOUT, so the heaviest inputs complete with margin
while the common case still finishes quickly. CPU allocation stays at the
default 2 (the runner core count is tuned so the parallel job count
saturates it at 2 CPUs each); only wall-clock budget was the constraint.

Rebased onto current next.
@AztecBot AztecBot force-pushed the cb/avm-check-circuit-timeout branch from e0f791e to 380065d Compare June 5, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant