test(e2e): scale bad-checkpoint wait to selected slots in `proposer invalidates multiple checkpoints` by AztecBot · Pull Request #24016 · AztecProtocol/aztec-packages

AztecBot · 2026-06-11T12:02:47Z

Fixes a flake in proposer invalidates multiple checkpoints (e2e_epochs/epochs_invalidate_block.parallel.test.ts) reported on v5-next: failed run.

Failure

TimeoutError: Operation timed out after 256000ms. 256000ms = L2_SLOT_DURATION_IN_S * 8 * 1000, and the missing custom error message identifies the bare timeoutPromise(...) racing the "wait for two bad checkpoints" promise — the only timeout call in the file without a message.

Root cause

The bad-slot search introduced in #23608 rejects any candidate pair whose proposer also owns an earlier un-snapshotted pipelined slot. With 6 validators this can reject many pairs in a row: in the failed run, the current slot was 21 and the search rejected (24,25) through (29,30) before accepting slots 30 and 31 (proposer 0x14dc… first appears in the schedule at slot 30; every earlier pair had p1/p2 owning a slot in the growing pre-bad window).

The wait for the two bad checkpoints, however, is fixed at 8 slots (256s) regardless of how far out the selected slots are. Timeline from the failed log:

22:44:39 — "First checkpoint mined, current slot is 21"; search selects bad slots 30/31; wait starts
22:45:11…22:48:23 — healthy checkpoints 3–9 mined at slots 22–28, one per 32s slot (the chain was fine; the malicious config was never exercised)
22:48:55 — timeout fires, exactly 256s after the wait began — before slot 30 even started (~22:49:00); its checkpoint could not land on L1 until ~22:49:30

With badSlot1 = currentSlot + 9, the 8-slot wait is mathematically unable to succeed, so the run was doomed at slot-selection time. In passing runs the first or an early candidate pair is accepted (currentSlot + 3/+4), and the checkpoints land ~130–160s into the 256s window.

Fix

Size the timeout from the distance to badSlot2 (badSlot2 - currentSlot + 4 slots) instead of a fixed 8 slots, and give it a descriptive error message.
Remove the .then(() => [CheckpointNumber(0), CheckpointNumber(0)]) fallback: timeoutPromise rejects on timeout, so the fallback was dead code (the graceful-failure path it implied never ran).
Bump the suite timeout from 10 to 15 minutes: with far-out bad slots the legitimately-needed wall clock in this run's geometry (~580s for the test body) cuts too close to the 600s limit.

Note for porting: the test file is identical on next and v5-next (modulo an unrelated l1PublishingTime line), and the flake fired on v5-next — the same change applies there (merge-train/spartan-v5).

Created by claudebox · group: slackbot

…nvalidates multiple checkpoints`

…idates multiple checkpoints` (#24017) Fixes a flake in `proposer invalidates multiple checkpoints` (`e2e_epochs/epochs_invalidate_block.parallel.test.ts`) reported on `v5-next`: [failed run](http://ci.aztec-labs.com/e4076dd86c434c6f). Replaces #24016 (was based on `merge-train/spartan`; this one targets the v5 line where the flake fired and restructures the test instead of just resizing the timeout). ## Root cause of the flake `TimeoutError: Operation timed out after 256000ms` — the bare 8-slot `timeoutPromise` waiting for the two bad checkpoints. The bad-slot search from #23608 rejects any candidate pair whose proposer also owns an earlier un-snapshotted pipelined slot, and the rejection window grows with each attempt. In the failed run the current slot was 21 and the search rejected (24,25)…(29,30) before accepting slots **30/31** — 9–10 slots out. The fixed 256s wait expired at 22:48:55, before slot 30 even began (~22:49:00), while the chain healthily mined checkpoints at slots 22–28 underneath; the run was unwinnable at selection time. The race's `.then(() => [CheckpointNumber(0), …])` fallback was also dead code, since `timeoutPromise` rejects. ## Fix: search first, then warp Instead of starting the sequencers and waiting in real time for whatever slots the search lands on: - With sequencers stopped, search for a `warpSlot` such that the proposers of the three lead-in slots `warpSlot+1..warpSlot+3` are not the proposers of the bad slots `warpSlot+4`/`warpSlot+5`. A far-away candidate now costs a warp instead of a real-time wait, and `EpochNotStable` during the search is handled by warping forward one epoch (same pattern as the `archiver skips a descendant` test in this file). - Warp to one L1 block before `warpSlot`, so sequencers get a full L2 slot to boot before the first pipelined build window we rely on (end of `warpSlot`, targeting `warpSlot+1`). - Start the sequencers and wait for the first good checkpoint (lands at `warpSlot`, or up to `warpSlot+2` on a slow start). - Apply the malicious config to the bad-slot proposers. The three good lead-in slots guarantee no pipelined job before `badSlot1` can snapshot it, since jobs snapshot config during the last L1 slot of the previous L2 slot. - Fail fast with a clear assertion if config application was somehow late enough to reach `badSlot1`'s build window, rather than timing out opaquely. - The 8-slot wait for the bad checkpoints is now correctly sized by construction (`badSlot2` is at most ~6 slots from the wait start), and gets a descriptive timeout message. Worst case the wait phase is bounded at ~6 slots regardless of how many candidates the search rejects, where previously each rejected candidate pushed the bad checkpoints one slot further past the fixed timeout. --- *Created by [claudebox](https://claudebox.work/v2/sessions/d509a218614bf4ac) · group: `slackbot`*

test(e2e): scale bad-checkpoint wait to selected slots in `proposer i…

f634db5

…nvalidates multiple checkpoints`

AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels Jun 11, 2026

spalladino changed the base branch from merge-train/spartan to merge-train/spartan-v5 June 11, 2026 12:13

spalladino marked this pull request as ready for review June 11, 2026 12:13

spalladino requested review from a team, IlyasRidhuan, LeilaWang, MirandaWood, charlielye, jeanmon, nventuro and sirasistant as code owners June 11, 2026 12:13

spalladino changed the base branch from merge-train/spartan-v5 to merge-train/spartan June 11, 2026 12:13

spalladino removed request for a team, IlyasRidhuan, LeilaWang, MirandaWood, charlielye, jeanmon, nventuro and sirasistant June 11, 2026 12:14

AztecBot mentioned this pull request Jun 11, 2026

test(e2e): pick bad slots upfront and warp to them in proposer invalidates multiple checkpoints #24017

Merged

AztecBot closed this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): scale bad-checkpoint wait to selected slots in `proposer invalidates multiple checkpoints`#24016

test(e2e): scale bad-checkpoint wait to selected slots in `proposer invalidates multiple checkpoints`#24016
AztecBot wants to merge 1 commit into
merge-train/spartanfrom
cb/fix-invalidate-multiple-checkpoints-wait

AztecBot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AztecBot commented Jun 11, 2026

Failure

Root cause

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant