Skip to content

test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining#23336

Merged
spalladino merged 1 commit into
merge-train/spartanfrom
claudebox/fix-pr-23253-dequeue-v4
May 16, 2026
Merged

test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining#23336
spalladino merged 1 commit into
merge-train/spartanfrom
claudebox/fix-pr-23253-dequeue-v4

Conversation

@AztecBot

@AztecBot AztecBot commented May 16, 2026

Copy link
Copy Markdown
Collaborator

Why

PR #23253 was dequeued (4th attempt) when merge-queue-heavy caught an e2e_amm.test.ts setup tx getting dropped by a pipelining-driven chain prune. CI log: baec5a7453c20089.

The wait-for-parent gate in CheckpointProposalJob.waitForValidParentCheckpointOnL1 (sequencer-client/src/sequencer/checkpoint_proposal_job.ts:398) should have blocked the discard, but it didn't — because a TestDateProvider time warp from AnvilTestWatcher.syncDateProviderToL1IfBehind landed between the two epochCache reads in Sequencer.work (sequencer.ts:217-218) and broke the pipelining invariant.

step wall-clock nowSeconds result
1st getEpochAndSlotInNextL1Slot (slot) ≈14:34:32.385 (pre-warp) 1778942079 next L1 ts 1778942080slot 18
(warp at 14:34:32.390 sets offset 7611 → 7610)
2nd getTargetEpochAndSlotInNextL1Slot (targetSlot) ≈14:34:32.395 (post-warp) 1778942080 next L1 ts 1778942084slot 19+offset=1targetSlot 20

Logged confirmation (gap = 2 instead of 1):

14:34:32.612  Preparing checkpoint proposal 19 for target slot 20 during wall-clock slot 18
              {nowSeconds=1778942079, slot=18, targetSlot=20, …}

With slotNow = 18, the gate at checkpoint_proposal_job.ts:402 waits on waitForSyncedL2SlotNumber(slotNow). The archiver had already synced past slot 18 — the wait returns immediately, far too early to see parent ckpt 18 (which lands four seconds later at 14:34:36). The gate then sees checkpointedNumber=17, parentCheckpointNumber=18, declares the parent absent, and discards. Slot 20 expires uncheckpointed, archiver prunes blocks 19/20, the inflight setup tx anchored to block 19 dies with Block header not found.

Full timeline + log evidence: https://gist.github.com/AztecBot/4863d10084dd20587bffcc43fd61dfee

What

Scoped, test-only — per direction from Santiago. The previous "make checkpointed the global PXE default" approach is reverted; only e2e_amm is opted in:

-    } = await setup(4, { ...PIPELINING_SETUP_OPTS }));
+    } = await setup(4, { ...PIPELINING_SETUP_OPTS }, { syncChainTip: 'checkpointed' }));

The PXE option exists already (yarn-project/pxe/src/config/index.ts, added in 75df5b5d44). This is the same approach every other pipelining-aware test uses (e2e_p2p/*, e2e_epochs/*, e2e_slashing/attested_invalid_proposal). It anchors inflight txs to the L1-confirmed tip so prunes on the proposed tip can't invalidate them.

PIPELINING_SETUP_OPTS is left untouched — the pipelining migration of e2e_amm in #23275 stays.

Recommended follow-up (separate PR)

The real bug is the race in Sequencer.work. Worth fixing properly:

  • Snapshot the time once. Add EpochCache.getCurrentAndTargetSlotInNextL1Slot() that returns {slot, targetSlot, epoch, targetEpoch, ts, nowSeconds} from a single dateProvider.nowInSeconds() read; replace the two-call site in Sequencer.work. Pipelining offset is a constant, so deriving targetSlot = slot + offset from the same snapshot is trivial.
  • Defensive: wait on targetSlot - 1. waitForValidParentCheckpointOnL1 should key off the parent's expected build slot (targetSlot - 1) instead of slotNow, so the gate is robust even if the invariant is broken upstream.

These aren't in this PR because they touch sequencer production code and want their own review; the test-side workaround unblocks the merge-train without changing the global PXE default.

Test plan

The failure requires merge-queue-heavy's 10-grind L1 contention to surface reliably (single dev box can't reproduce). Change is a single-arg addition; TS-trivial.

Analysis: https://gist.github.com/AztecBot/4863d10084dd20587bffcc43fd61dfee

ClaudeBox log: https://claudebox.work/s/166e664eab264b04?run=3

@AztecBot AztecBot added ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels May 16, 2026
@AztecBot AztecBot changed the title test(e2e): opt e2e_amm out of pipelining (chain-prune flake under merge-queue-heavy) fix(pxe): default syncChainTip to checkpointed so prunes don't invalidate inflight tx anchors May 16, 2026
@AztecBot AztecBot force-pushed the claudebox/fix-pr-23253-dequeue-v4 branch from 17fdd20 to ef2e7bc Compare May 16, 2026 15:35
@AztecBot AztecBot changed the title fix(pxe): default syncChainTip to checkpointed so prunes don't invalidate inflight tx anchors test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining May 16, 2026
@AztecBot AztecBot force-pushed the claudebox/fix-pr-23253-dequeue-v4 branch from ef2e7bc to 26aecd9 Compare May 16, 2026 16:10
@spalladino spalladino marked this pull request as ready for review May 16, 2026 16:15
@spalladino spalladino enabled auto-merge (squash) May 16, 2026 16:15
@spalladino spalladino merged commit 6874e44 into merge-train/spartan May 16, 2026
25 of 33 checks passed
@spalladino spalladino deleted the claudebox/fix-pr-23253-dequeue-v4 branch May 16, 2026 16:49
AztecBot added a commit that referenced this pull request May 16, 2026
Both fail repeatedly on merge-train attempts under proposer pipelining
despite fix attempts (#23303, #23334 for fee_settings; #23336 for
e2e_amm). Skipping in .test_patterns.yml to land the train; to be
triaged and re-enabled (tracking issue assigned to spalladino).
danielntmd pushed a commit to danielntmd/aztec-packages that referenced this pull request Jun 4, 2026
BEGIN_COMMIT_OVERRIDE
refactor(p2p): merge FastTxCollection into TxCollection with sequential
pipeline (AztecProtocol#23245)
refactor(publisher): bundle-level simulate; drop per-action enqueue sims
(AztecProtocol#23165)
refactor(stdlib): remove deprecated RevertCode/TxExecutionResult aliases
(AztecProtocol#23249)
test(e2e): fix race in 'proposer invalidates multiple checkpoints'
(AztecProtocol#23259)
fix: clean up old jobs regardless of pending status (AztecProtocol#23260)
refactor(p2p): remove unused sendBatchRequest (AztecProtocol#23273)
chore(p2p): remove proposal_tx_collector leftovers (AztecProtocol#23276)
feat: slash truncated checkpoint proposals (AztecProtocol#23250)
refactor: remove unused map in attestation pool (AztecProtocol#23284)
chore(p2p): assert last block in checkpoint proposal is correct (AztecProtocol#23274)
refactor(l1-tx-utils): use DateProvider for fail-fast timeout check
(AztecProtocol#23257)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23277)
test(e2e): fix race in broadcasted_invalid_block_proposal_slash under
pipelining (AztecProtocol#23302)
fix(archiver): atomic getter for L2 tips (AztecProtocol#23295)
fix(sequencer): use targetSlot in tryVoteWhenEscapeHatchOpen under
pipelining (AztecProtocol#23296)
fix(world-state): make fork close idempotent for pruned forks (AztecProtocol#23298)
test(e2e): migrate passing tests to proposer pipelining (AztecProtocol#23275)
chore: update dashboard (AztecProtocol#23312)
chore: Revert "feat(sandbox): support proposer pipelining in local
network" (AztecProtocol#23313)
test: slash on bad attestation (AztecProtocol#23184)
feat(slasher): per-slot data-withholding watcher (A-523, A-525) (AztecProtocol#23116)
test(e2e): enable pipelining on e2e debug trace (AztecProtocol#23301)
test(e2e): enable pipelining on l1-to-l2 test (AztecProtocol#23300)
test(e2e): switch fee_settings to organic fee bumps under pipelining
(AztecProtocol#23303)
fix(ci): retry sqlite3mc-wasm download on transient DNS/TLS failures
(AztecProtocol#23333)
test(e2e): wait for real oracle rotation in fee_settings inflate helper
(AztecProtocol#23334)
test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining
(AztecProtocol#23336)
fix(spartan-bench): tolerate older node images in SlasherConfig schema
(AztecProtocol#23351)
fix: interrupt prover jobs in stop (AztecProtocol#23358)
test(e2e): enable pipelining on bot, fees, and avm simulator tests
(AztecProtocol#23329)
feat(sentinel): end-of-epoch evaluation with re-execution outcomes
(AztecProtocol#23286)
feat: slash for invalid checkpoint proposals (AztecProtocol#23270)
fix: fork closure in epoch proving jobs (AztecProtocol#23390)
fix(slasher): anchor watcher scans at archiver synced L2 slot (AztecProtocol#23394)
fix: avoid npm uplink for aztec-up local publishes (AztecProtocol#23396)
test(e2e): ignore benign 'Insufficient valid txs' block-build-failed in
epochs tests (AztecProtocol#23424)
chore: refactor weekly proving test wait (AztecProtocol#23395)
refactor: add fifo set (AztecProtocol#23271)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23327)
fix(p2p): validate BLOCK_TXS in BatchTxRequester (AztecProtocol#23371)
chore(p2p): simplify IBatchRequestTxValidator (AztecProtocol#23373)
feat(sequencer): AutomineSequencer for single-sequencer e2e tests
(AztecProtocol#23354)
fix(prover): wait for previous epoch to be proven (AztecProtocol#23458)
chore: collocate provers (AztecProtocol#23439)
chore: rm staging-ignition (AztecProtocol#23440)
chore: rm unused networks (AztecProtocol#23441)
test(e2e): migrate block_building, multi_validator_node,
publisher_funding, invalid_checkpoint_proposal to pipelining (AztecProtocol#23414)
fix(archiver): reconcile local blocks with L1 checkpoints by block
number (AztecProtocol#23461)
feat: Updated slash conditions on block proposals (AztecProtocol#23466)
test(e2e): migrate HA full test to pipelining (AztecProtocol#23463)
chore: update resource profiles (AztecProtocol#23442)
chore: update debug log levels (AztecProtocol#23456)
test: fix flaky sentinel_status_slash by asserting the fault on the
checkpoint slot (AztecProtocol#23483)
feat(slasher): slash checkpoint equivocation between P2P and L1 (A-980)
(AztecProtocol#23436)
refactor(slasher): rename ATTESTED_DESCENDANT_OF_INVALID ->
PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS (AztecProtocol#23468)
fix: reject block proposals in poisoned slots (AztecProtocol#23411)
fix: retry nargo dep + solc downloads to survive transient DNS drops
(AztecProtocol#23490)
fix: enrich json-rpc tracing (AztecProtocol#23412)
feat: add trace export controls (AztecProtocol#23413)
test(e2e): assert no equivocation offenses in HA full test (AztecProtocol#23496)
test: cover invalid checkpoint proposal slashing (AztecProtocol#23503)
test(e2e): migrate more e2e suites to proposer pipelining (AztecProtocol#23482)
test: flag e2e_slashing_attested_invalid_proposal as flake under
pipelining (AztecProtocol#23501)
test: flag e2e_p2p_duplicate_proposal_slash as flake under pipelining
(AztecProtocol#23515)
test(e2e): require cross-observer agreement on sentinel fault slot
(AztecProtocol#23513)
test: flag e2e_ha_full afterAll hook timeout as flake under pipelining
(AztecProtocol#23524)
fix(e2e): propagate l1ContractsArgs into node config so archiver matches
L1 (AztecProtocol#23514)
test: flag e2e_multi_validator_node_key_store P2P tx-dropped failure as
flake (AztecProtocol#23528)
test(cheat-codes): retry warpL2TimeAtLeastTo in-current-slot test on L1
race (AztecProtocol#23533)
test(e2e_ha_full): parallel HA peer node teardown with per-node deadline
(AztecProtocol#23539)
test: flag e2e_ha_full as flake under HA pipelining (AztecProtocol#23541)
test(ci): skip e2e_ha_full entirely on merge-train/spartan (AztecProtocol#23542)
test(ci): skip e2e_multi_validator_node_key_store entirely on
merge-train/spartan (AztecProtocol#23544)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants