test(e2e): fix `proposer invalidates multiple checkpoints` timeout by spalladino · Pull Request #23608 · AztecProtocol/aztec-packages

spalladino · 2026-05-27T20:46:32Z

Fixes flake in proposer invalidates multiple checkpoints e2e_epochs/epochs_invalidate_block.parallel.test.ts test that caused a timeout (see this run). See below for the Codex analysis and fix.

Test Summary
proposer invalidates multiple checkpoints verifies that two intended bad checkpoints land with insufficient attestations, a later good proposer invalidates the first bad checkpoint, and the chain then progresses.

Failed Run Error
CI run 8b1c0f4ec6031f2b timed out at Jest’s 600s limit. The failure was not the shutdown L1 send error; that happened after the timeout while teardown was interrupting pending work.

Failed vs Successful Divergence
First meaningful divergence: checkpoint 4 at slot 23.

Failed log: slot 23 published checkpoint 4 with only 1 attestation, then archivers reported Insufficient attestations ... actualAttestations:1.
Successful log: slot 23 collected all 5 attestations before publishing checkpoint 4, so the first intentionally bad checkpoints were later.

Timeline
Failed:

15:59:11 selected intended bad slots 25/26, applied bad config to proposer 0x15...
15:59:35 slot 23 job prepared by that same proposer
16:00:15 checkpoint 4 at slot 23 landed with 1 attestation
repeated rollback/retry consumed enough time to hit Jest timeout

Successful:

slot 23 checkpoint landed cleanly with 5 attestations
intended bad checkpoints at slots 24/25 landed with 1 attestation
checkpoint 5 was invalidated
test completed successfully

Hypothesis
High confidence: the test’s bad-slot selection only excluded candidateSlot1 - 1 as a pre-bad pipelined target. In the failed run, candidateSlot1 - 2 was still unsnapshotted and owned by a bad proposer, so applying malicious config leaked into slot 23.

Evidence

Logs: failed run selected slots 25/26 but slot 23 later published with 1 attestation from the newly bad proposer.
Source: pipelined checkpoint jobs snapshot sequencer config when the target-slot job is created, so applying config while sequencers are running can affect any not-yet-created pre-bad job.
Skeptic check: no contradiction found; it also caught a broken local timeout race.

Proposed Fix
Implemented in epochs_invalidate_block.parallel.test.ts: the selector now excludes bad proposers from every pre-bad target slot from currentSlot + 2 through candidateSlot1 - 1, not just the immediately prior slot.

Also fixed the broken timeout race at line 475 by removing the accidental inner await.

**Test Summary** `proposer invalidates multiple checkpoints` verifies that two intended bad checkpoints land with insufficient attestations, a later good proposer invalidates the first bad checkpoint, and the chain then progresses. **Failed Run Error** CI run `8b1c0f4ec6031f2b` timed out at Jest’s 600s limit. The failure was not the shutdown L1 send error; that happened after the timeout while teardown was interrupting pending work. **Failed vs Successful Divergence** First meaningful divergence: checkpoint 4 at slot 23. Failed log: slot 23 published checkpoint 4 with only 1 attestation, then archivers reported `Insufficient attestations ... actualAttestations:1`. Successful log: slot 23 collected all 5 attestations before publishing checkpoint 4, so the first intentionally bad checkpoints were later. **Timeline** Failed: - `15:59:11` selected intended bad slots 25/26, applied bad config to proposer `0x15...` - `15:59:35` slot 23 job prepared by that same proposer - `16:00:15` checkpoint 4 at slot 23 landed with 1 attestation - repeated rollback/retry consumed enough time to hit Jest timeout Successful: - slot 23 checkpoint landed cleanly with 5 attestations - intended bad checkpoints at slots 24/25 landed with 1 attestation - checkpoint 5 was invalidated - test completed successfully **Hypothesis** High confidence: the test’s bad-slot selection only excluded `candidateSlot1 - 1` as a pre-bad pipelined target. In the failed run, `candidateSlot1 - 2` was still unsnapshotted and owned by a bad proposer, so applying malicious config leaked into slot 23. **Evidence** - Logs: failed run selected slots 25/26 but slot 23 later published with 1 attestation from the newly bad proposer. - Source: pipelined checkpoint jobs snapshot sequencer config when the target-slot job is created, so applying config while sequencers are running can affect any not-yet-created pre-bad job. - Skeptic check: no contradiction found; it also caught a broken local timeout race. **Proposed Fix** Implemented in [epochs_invalidate_block.parallel.test.ts](/home/santiago/Projects/aztec-1/yarn-project/end-to-end/src/e2e_epochs/epochs_invalidate_block.parallel.test.ts:393): the selector now excludes bad proposers from every pre-bad target slot from `currentSlot + 2` through `candidateSlot1 - 1`, not just the immediately prior slot. Also fixed the broken timeout race at [line 475](/home/santiago/Projects/aztec-1/yarn-project/end-to-end/src/e2e_epochs/epochs_invalidate_block.parallel.test.ts:475) by removing the accidental inner `await`.

AztecBot · 2026-05-27T21:15:11Z

Flakey Tests

🤖 says: This CI run detected 2 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/05e2f35ea87960af�05e2f35ea87960af8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_invalidate_block.parallel.test.ts "proposer invalidates multiple checkpoints" (434s) (code: 0) group:e2e-p2p-epoch-flakes
\033FLAKED\033 (8;;http://ci.aztec-labs.com/8f3d2a7009259b3c�8f3d2a7009259b3c8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_p2p/multiple_validators_sentinel.parallel.test.ts "collects attestations for validators in proposer node when block is not published" (400s) (code: 0) group:e2e-p2p-epoch-flakes

BEGIN_COMMIT_OVERRIDE fix(archiver): skip descendants of invalid-attestations checkpoints (AztecProtocol#23502) chore: scale network validators (AztecProtocol#23579) fix(ci): nightly 10 TPS bench GCP auth and checkout (AztecProtocol#23586) chore: set eth node resource profile (AztecProtocol#23583) fix: wait for checkpoint before sentinel assertions (AztecProtocol#23573) fix: slash attestations for invalid checkpoint proposals (AztecProtocol#23506) test: fix web3signer pipelining `e2e_multi_validator_node_key_store.test.ts` (AztecProtocol#23568) fix: cap CI devbox hostname (AztecProtocol#23591) test: stabilize invalid checkpoint descendant e2e (AztecProtocol#23582) test(e2e): stabilize invalidation slots in `proposer invalidates multiple checkpoints` (AztecProtocol#23590) test(e2e): stabilize invalid proposal slashing target slot in `attested_invalid_proposal` (AztecProtocol#23589) chore(foundation): faster toBufferBE via zero fast-path (AztecProtocol#23592) fix: honour BB_BINARY_PATH (AztecProtocol#23570) chore: bump reth and lighthouse (AztecProtocol#23588) chore: add web3signer and postgres node selectors (AztecProtocol#23598) fix: do not symlink .codex folders (AztecProtocol#23593) chore: fix claude and codex symlinking tests (AztecProtocol#23599) test(e2e): narrow down sentinel check in `multiple_validators_sentinel` (AztecProtocol#23604) test(e2e): fix `proposer invalidates multiple checkpoints` timeout (AztecProtocol#23608) fix: record zero-amount slashing offenses (AztecProtocol#23556) fix: log slashing offense names (AztecProtocol#23565) feat(p2p): tx validation cache (AztecProtocol#23585) chore: add KEDA deployment module (AztecProtocol#23553) chore: add KEDA prover agent autoscaling (AztecProtocol#23554) chore: update destroy_bootnode.sh (AztecProtocol#23626) chore: skip failing chonk_pinned_inputs.test in CI (AztecProtocol#23643) chore(ci): tolerate public authwit P2P receipt flake (AztecProtocol#23648) END_COMMIT_OVERRIDE

…idates multiple checkpoints` (#24017) Fixes a flake in `proposer invalidates multiple checkpoints` (`e2e_epochs/epochs_invalidate_block.parallel.test.ts`) reported on `v5-next`: [failed run](http://ci.aztec-labs.com/e4076dd86c434c6f). Replaces #24016 (was based on `merge-train/spartan`; this one targets the v5 line where the flake fired and restructures the test instead of just resizing the timeout). ## Root cause of the flake `TimeoutError: Operation timed out after 256000ms` — the bare 8-slot `timeoutPromise` waiting for the two bad checkpoints. The bad-slot search from #23608 rejects any candidate pair whose proposer also owns an earlier un-snapshotted pipelined slot, and the rejection window grows with each attempt. In the failed run the current slot was 21 and the search rejected (24,25)…(29,30) before accepting slots **30/31** — 9–10 slots out. The fixed 256s wait expired at 22:48:55, before slot 30 even began (~22:49:00), while the chain healthily mined checkpoints at slots 22–28 underneath; the run was unwinnable at selection time. The race's `.then(() => [CheckpointNumber(0), …])` fallback was also dead code, since `timeoutPromise` rejects. ## Fix: search first, then warp Instead of starting the sequencers and waiting in real time for whatever slots the search lands on: - With sequencers stopped, search for a `warpSlot` such that the proposers of the three lead-in slots `warpSlot+1..warpSlot+3` are not the proposers of the bad slots `warpSlot+4`/`warpSlot+5`. A far-away candidate now costs a warp instead of a real-time wait, and `EpochNotStable` during the search is handled by warping forward one epoch (same pattern as the `archiver skips a descendant` test in this file). - Warp to one L1 block before `warpSlot`, so sequencers get a full L2 slot to boot before the first pipelined build window we rely on (end of `warpSlot`, targeting `warpSlot+1`). - Start the sequencers and wait for the first good checkpoint (lands at `warpSlot`, or up to `warpSlot+2` on a slow start). - Apply the malicious config to the bad-slot proposers. The three good lead-in slots guarantee no pipelined job before `badSlot1` can snapshot it, since jobs snapshot config during the last L1 slot of the previous L2 slot. - Fail fast with a clear assertion if config application was somehow late enough to reach `badSlot1`'s build window, rather than timing out opaquely. - The 8-slot wait for the bad checkpoints is now correctly sized by construction (`badSlot2` is at most ~6 slots from the wait start), and gets a descriptive timeout message. Worst case the wait phase is bounded at ~6 slots regardless of how many candidates the search rejects, where previously each rejected candidate pushed the bad checkpoints one slot further past the fixed timeout. --- *Created by [claudebox](https://claudebox.work/v2/sessions/d509a218614bf4ac) · group: `slackbot`*

spalladino changed the title ~~test(e2e): fix 'proposer invalidates multiple checkpoints' timeout~~ test(e2e): fix proposer invalidates multiple checkpoints timeout May 27, 2026

spalladino added the ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure label May 27, 2026

PhilWindle approved these changes May 27, 2026

View reviewed changes

PhilWindle merged commit 206eb0f into merge-train/spartan May 27, 2026
31 of 38 checks passed

PhilWindle deleted the spl/fix-invalidate-block-again branch May 27, 2026 21:53

AztecBot mentioned this pull request May 27, 2026

feat: merge-train/spartan #23580

Merged

This was referenced Jun 11, 2026

test(e2e): scale bad-checkpoint wait to selected slots in proposer invalidates multiple checkpoints #24016

Closed

test(e2e): pick bad slots upfront and warp to them in proposer invalidates multiple checkpoints #24017

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): fix `proposer invalidates multiple checkpoints` timeout#23608

test(e2e): fix `proposer invalidates multiple checkpoints` timeout#23608
PhilWindle merged 1 commit into
merge-train/spartanfrom
spl/fix-invalidate-block-again

spalladino commented May 27, 2026 •

edited

Loading

Uh oh!

AztecBot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

spalladino commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AztecBot commented May 27, 2026

Flakey Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

spalladino commented May 27, 2026 •

edited

Loading