test(ci): tolerate pipelined-publisher flake in e2e_scope_isolation#23367
Closed
AztecBot wants to merge 1 commit into
Closed
test(ci): tolerate pipelined-publisher flake in e2e_scope_isolation#23367AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
Collaborator
Author
|
Automatically closing this stale claudebox draft PR (no updates for 5+ days). Re-open if still needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Mark
src/e2e_scope_isolation.test.tsas a flake-retry test in.test_patterns.yml, matching theTx dropped by P2P nodefailure mode.Failure
CI run on merge-train/spartan after #23286 merged failed in
e2e_scope_isolation(test log) with all 8 sub-tests failing identically inbeforeAll:Root cause
Pipelined-publisher timing flake — not a regression from #23286. The sequencer discarded the pipelined checkpoint-9 work at slot 10 because its parent checkpoint 8 hadn't yet landed on L1 at the discard-check moment (checkpoint 8 mined ~4s later, too late). No on-L1 checkpoint for slot 10 → archiver pruned the local chain back to block 8 → the in-flight test tx was reported as "Tx dropped by P2P node".
Same class of pipelining + AnvilTestWatcher timing issue Santiago is fixing in #23340 (fast block build config) and #23354 (
AutomineSequencer). Verified by reading the diff: PR #23286 doesn't touch the L1 publish path; itscheckpoint_proposal_job.tschange is gated onbroadcastInvalidCheckpointProposalOnlywhich defaults to false.Full analysis: https://gist.github.com/AztecBot/d839aa82c2e72eca30a8ebcd9f32b592
Change
One entry added next to the existing
e2e_l1_publisherflake entry (which uses the same retry mechanism for an analogous anvil-timing failure mode). Owner is*pallasince Santiago authored the pipelining migration (#23275) and owns the in-flight cleanup work.Verification
yq e '.tests[]' .test_patterns.ymlparses cleanly.ci3/get_test_entryreturns the new entry when given the failing test cmd + log file, confirming both theregexanderror_regexmatch.*pallaresolves toU04TPBU26E8../bootstrap.sh ciis not applicable: this change only touches.test_patterns.yml, whichci3/run_test_cmdconsults at test-driver time and which doesn't affect any build artifact. The behaviour exercises on the next merge-train CI run.ClaudeBox log: https://claudebox.work/s/b5d7f3b26323c2bd?run=1