test(e2e): pick bad slots upfront and warp to them in proposer invalidates multiple checkpoints#24017
Merged
spalladino merged 1 commit intoJun 11, 2026
Conversation
…idates multiple checkpoints`
spalladino
approved these changes
Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes a flake in
proposer invalidates multiple checkpoints(e2e_epochs/epochs_invalidate_block.parallel.test.ts) reported onv5-next: failed run. Replaces #24016 (was based onmerge-train/spartan; this one targets the v5 line where the flake fired and restructures the test instead of just resizing the timeout).Root cause of the flake
TimeoutError: Operation timed out after 256000ms— the bare 8-slottimeoutPromisewaiting for the two bad checkpoints. The bad-slot search from #23608 rejects any candidate pair whose proposer also owns an earlier un-snapshotted pipelined slot, and the rejection window grows with each attempt. In the failed run the current slot was 21 and the search rejected (24,25)…(29,30) before accepting slots 30/31 — 9–10 slots out. The fixed 256s wait expired at 22:48:55, before slot 30 even began (~22:49:00), while the chain healthily mined checkpoints at slots 22–28 underneath; the run was unwinnable at selection time. The race's.then(() => [CheckpointNumber(0), …])fallback was also dead code, sincetimeoutPromiserejects.Fix: search first, then warp
Instead of starting the sequencers and waiting in real time for whatever slots the search lands on:
warpSlotsuch that the proposers of the three lead-in slotswarpSlot+1..warpSlot+3are not the proposers of the bad slotswarpSlot+4/warpSlot+5. A far-away candidate now costs a warp instead of a real-time wait, andEpochNotStableduring the search is handled by warping forward one epoch (same pattern as thearchiver skips a descendanttest in this file).warpSlot, so sequencers get a full L2 slot to boot before the first pipelined build window we rely on (end ofwarpSlot, targetingwarpSlot+1).warpSlot, or up towarpSlot+2on a slow start).badSlot1can snapshot it, since jobs snapshot config during the last L1 slot of the previous L2 slot.badSlot1's build window, rather than timing out opaquely.badSlot2is at most ~6 slots from the wait start), and gets a descriptive timeout message.Worst case the wait phase is bounded at ~6 slots regardless of how many candidates the search rejects, where previously each rejected candidate pushed the bad checkpoints one slot further past the fixed timeout.
Created by claudebox · group:
slackbot