chore: Properly compute finalized block#21156
Conversation
spalladino
left a comment
There was a problem hiding this comment.
Looks good! Just two concerns:
1- There are some lingering TODOs about finalization that need to be removed now that this is implemented (search TODO(#13569)), and one that needs to be implemented (see rollbackTo in the archiver).
2- I'm very surprised that the e2e reorg tests that trigger a reorg that removes a proof via an L1 reorg still work with the change in anvil so that the L1 epoch length is 1. That setting means that proven becomes finalized immediately, so nodes shouldn't be able to handle a reorg to before the finalized checkpoint. No actionable here, just something I'm surprised/worried about.
| const getProvenCheckpointNumber = (node: AztecNode) => node.getL2Tips().then(tips => tips.proven.checkpoint.number); | ||
|
|
||
| it('prunes L2 blocks if a proof is removed due to an L1 reorg', async () => { | ||
| it.only('prunes L2 blocks if a proof is removed due to an L1 reorg', async () => { |
There was a problem hiding this comment.
Did you intend to add only?
…ztec-packages into spy/finalized-checkpoint
BEGIN_COMMIT_OVERRIDE fix: (A-623) increase committee timeout in scenario smoke test (#21193) feat: orchestrator enqueues via serial queue (#21247) feat: rollup mana limit gas validation (#21219) fix: make e2e HA test more deterministic (#21199) chore: fix chonk_browser lint warning (#21265) chore: deploy SPONSORED_FPC in test networks (#21254) fix: (A-635) e2e bot flake on nonce mismatch (#21288) chore: deflake duplicate attestations and proposals slash tests (#21294) fix(sequencer): fix log when not enough txs (#21297) chore: send env var to pods (#21307) fix: Simulate gas in n tps test. Set min txs per block to 1 (#21312) fix: update dependabot dependencies (#21238) test: run nightly bench of block capacity (#20726) fix: update block_capacity test to use new send() result types (#21345) fix(node): fix index misalignment in findLeavesIndexes (#21327) fix(log): do not log validation error if unregistered handler (#21111) fix: limit parallel blocks in prover to max AVM parallel simulations (#21320) fix: use native sha256 to speed up proving job id generation (#21292) chore: remove v4-devnet-1 (#21044) fix(validator): wait for l1 sync before processing block proposals (#21336) fix(txpool): cap priority fee with max fees when computing priority (#21279) chore: Properly compute finalized block (#21156) fix: remove extra argument in KVArchiverDataStore constructor call (#21361) chore: revert l2 slot time 72 -> 36 on scenario network (#21291) fix(archiver): do not error if proposed block matches checkpointed (#21367) fix(claude): rule to not append echo exit (#21368) chore: reduce severity of errors due to HA node not acquiring signature (#21311) fix: make reqresp batch retry test deterministic (#21322) fix: (A-643) add buffer to maxFeePerBlobGas for gas estimation and fix bump loop truncation (#21323) fix(e2e): use L2 priority fee in deploy_method same-block test (#21373) fix: reqresp flake & add logging (#21334) END_COMMIT_OVERRIDE
…ization race (#21452) ## Summary - Sets `anvilSlotsInAnEpoch: 32` in `e2e_offchain_payment` test setup, matching what `epochs_l1_reorgs` already does. ## Problem PR #21156 added `--slots-in-an-epoch 1` as the default for anvil, making `finalized = latest - 2`. PR #20893 added `e2e_offchain_payment` which simulates L1 reorgs. When both landed on `merge-train/fairies`, the reorg test fails deterministically because finalization races past the rollback target block. ## Fix Use `anvilSlotsInAnEpoch: 32` (matching Ethereum mainnet) so the finalized block stays far enough behind latest to allow rollbacks in the test. ClaudeBox log: https://claudebox.work/s/c5ac5d52da86e23a?run=4
|
❌ Failed to cherry-pick to |
Cherry-pick of 078737f from next. Conflicts present in: - yarn-project/archiver/src/factory.ts - yarn-project/archiver/src/store/block_store.ts - yarn-project/ethereum/src/test/start_anvil.ts
…ints (#21597) ## Summary - Updates the finalized block heuristic from `epochDuration * 2` to `epochDuration * 2 * 4` to subtract checkpoints (assumed 4 blocks each) instead of blocks. - The proper fix is in #21156 which replaces this heuristic with L1 finality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… test (#21642) ## Summary - Removes the historic/finalized block verification checks from `epochs_multiple.test.ts` - The finalization logic on v4 is incorrect: it subtracts a fixed number of blocks (`epochDuration * 2`) instead of accounting for variable blocks per slot (up to 4 per slot), causing test timeouts - The correct finalization implementation exists on `next` in #21156 but is non-trivial to backport to v4 - Keeps the proven sync check intact — only historic/finalized assertions are removed ## Context See discussion in Slack: the current `getFinalizedL2BlockNumber` uses `provenBlockNumber - epochDuration * 2` which doesn't account for variable blocks per slot. This causes the tx mempool to evict transactions too aggressively and the test to time out waiting for finalization. ## Test plan - CI should pass — the test still verifies epoch proving and proven block sync, just without the finalized block assertions ClaudeBox log: https://claudebox.work/s/a5e9cea005ce4a5a?run=1
Fixes [A-551](https://linear.app/aztec-labs/issue/A-551/properly-compute-finalized-block) Replaces the heuristic finalized block computation (`provenBlock - 2 * epochDuration`) with L1 finality. On each archiver sync iteration, we now: 1. Fetch the finalized L1 block via `getBlock({ blockTag: 'finalized' })` 2. Query the rollup contract for the proven checkpoint number at that L1 block 3. Persist that as the finalized checkpoint, from which the finalized L2 block number is derived Failures in this step are caught and logged as warnings so they don't disrupt the rest of the sync loop (e.g. if the RPC node can't serve state at the finalized block). - `RollupContract.getProvenCheckpointNumber` now accepts an optional `{ blockNumber }` to query historical contract state - `BlockStore` stores a `lastFinalizedCheckpoint` singleton and derives `getFinalizedL2BlockNumber` from it instead of the old arithmetic heuristic - `ArchiverL1Synchronizer` gains `updateFinalizedCheckpoint()`, called every sync iteration - `KVArchiverDataStore` constructor no longer takes `l1Constants` (the `epochDuration` it was used for is no longer needed) - `FakeL1State` updated to support `blockTag: 'finalized'` and `getProvenCheckpointNumber` with a block number, enabling new sync tests
Fixes [A-551](https://linear.app/aztec-labs/issue/A-551/properly-compute-finalized-block) Replaces the heuristic finalized block computation (`provenBlock - 2 * epochDuration`) with L1 finality. On each archiver sync iteration, we now: 1. Fetch the finalized L1 block via `getBlock({ blockTag: 'finalized' })` 2. Query the rollup contract for the proven checkpoint number at that L1 block 3. Persist that as the finalized checkpoint, from which the finalized L2 block number is derived Failures in this step are caught and logged as warnings so they don't disrupt the rest of the sync loop (e.g. if the RPC node can't serve state at the finalized block). - `RollupContract.getProvenCheckpointNumber` now accepts an optional `{ blockNumber }` to query historical contract state - `BlockStore` stores a `lastFinalizedCheckpoint` singleton and derives `getFinalizedL2BlockNumber` from it instead of the old arithmetic heuristic - `ArchiverL1Synchronizer` gains `updateFinalizedCheckpoint()`, called every sync iteration - `KVArchiverDataStore` constructor no longer takes `l1Constants` (the `epochDuration` it was used for is no longer needed) - `FakeL1State` updated to support `blockTag: 'finalized'` and `getProvenCheckpointNumber` with a block number, enabling new sync tests --------- Co-authored-by: AztecBot <tech@aztec-labs.com>
Fixes [A-551](https://linear.app/aztec-labs/issue/A-551/properly-compute-finalized-block) Replaces the heuristic finalized block computation (`provenBlock - 2 * epochDuration`) with L1 finality. On each archiver sync iteration, we now: 1. Fetch the finalized L1 block via `getBlock({ blockTag: 'finalized' })` 2. Query the rollup contract for the proven checkpoint number at that L1 block 3. Persist that as the finalized checkpoint, from which the finalized L2 block number is derived Failures in this step are caught and logged as warnings so they don't disrupt the rest of the sync loop (e.g. if the RPC node can't serve state at the finalized block). - `RollupContract.getProvenCheckpointNumber` now accepts an optional `{ blockNumber }` to query historical contract state - `BlockStore` stores a `lastFinalizedCheckpoint` singleton and derives `getFinalizedL2BlockNumber` from it instead of the old arithmetic heuristic - `ArchiverL1Synchronizer` gains `updateFinalizedCheckpoint()`, called every sync iteration - `KVArchiverDataStore` constructor no longer takes `l1Constants` (the `epochDuration` it was used for is no longer needed) - `FakeL1State` updated to support `blockTag: 'finalized'` and `getProvenCheckpointNumber` with a block number, enabling new sync tests ---------
PR #21156 dropped the explicit `cheatCodes.rollup.markAsProven()` from this test on the assumption that the test infrastructure auto-proves every checkpoint. The AnvilTestWatcher's `markAsProven` loop only runs when anvil is in automine mode, but the e2e fixture switches to interval mining after L1 deployment, so `anvil_getAutomine` returns false and the auto-prove watcher never starts. The test therefore never sees a finalized checkpoint advance and pruning never fires. Restore the explicit `markAsProven` call (as the test had before PR #21156), mine a couple of empty checkpoints afterward so Anvil's `finalized = latest - 2` heuristic moves past the proven write, and shorten the retry budget now that pruning is actually triggered. Not a pipelining bug — the same regression affects the legacy flow; it was just masked by the test being skipped.
## Summary > **Depends on PR #23296** -- this PR is rebased on top of `palla/fix-b5-escape-hatch-slot-targeting`, which forward-ports the §6 B5 escape-hatch slot-targeting fix onto the modern `buildCheckpointSimulationOverridesPlan` + flat `l1Contracts` API. With B5 in, `e2e_sequencer/escape_hatch_vote_only` and `e2e_sequencer/gov_proposal.parallel` "should vote even when unable to build blocks" are now re-enabled under pipelining on this PR. Extracts the tests known to pass under proposer pipelining from PR #23150, without flipping the global default. Tests opt into pipelining explicitly via a new `PIPELINING_SETUP_OPTS` helper. The global `enableProposerPipelining` default stays `false` on `merge-train/spartan`; this PR migrates tests file-by-file so each one is opted in by name. This PR is intentionally scoped: it only includes tests whose pipelining-ready status is reasonably well understood. Tests that depend on shared base-class fixtures (`FeesTest`, `BlacklistTokenContractTest`, `CrossChainMessagingTest`, `DeployTest`, `FullProverTest`, etc.) keep their branch changes but are not yet wired to pipelining via their base class -- those base classes are used by tests outside this batch and a blanket opt-in would over-migrate. They will be migrated in follow-up PRs. Two commits: 1. **`test(e2e): opt unchanged tests into proposer pipelining`** -- adds `PIPELINING_SETUP_OPTS` to `fixtures.ts`, the small deploy-phase `accountsDeployMinTxs` conditional to `setup.ts`, and the explicit opt-in to every §1 test that calls `setup()` directly. 2. **`test(e2e): migrate tests that needed fixes into proposer pipelining`** -- the §2 tests with their branch fixes plus the infrastructure they depend on (sequencer.ts B5 fix, dummy_service.ts loopback, sequencer-publisher.ts error logging, sequencer-client READMEs rewrite, bootstrap.sh / test_simple.sh timeout bumps). The global default flip and the migration of base-class-using tests are intentionally deferred. They will land separately once each batch can be verified independently. --- ## §1 -- Pipelining enabled and passing (no code changes) Tests that pick up `enableProposerPipelining=true` from the explicit opt-in and pass without any per-test fix. This is the majority of the suite -- too many to enumerate. Examples include the unmodified `e2e_authwit`, `e2e_nft`, `e2e_amm`, `e2e_partial_notes`, `e2e_token_contract/*` (non-overflow), `e2e_offchain_*`, `e2e_orderbook`, `e2e_event_*`, `e2e_keys`, `e2e_avm_simulator` (after the suite-level timeout bump only), `e2e_pending_note_hashes_contract`, etc. None of these required test-level pipelining adaptations. Pre-existing `it.skip`s in this bucket are unrelated to pipelining (they predate the branch) and were not touched: - `e2e_token_contract/{transfer,transfer_in_private,transfer_in_public}` "transfer into account to overflow" - `e2e_blacklist_token_contract/{transfer_private,transfer_public}` "transfer into account to overflow" - `e2e_synching` "replay history and then do a fresh sync" / "a wild prune appears" - `e2e_p2p/reex` "validators re-execute transactions before attesting" ## §2 -- Pipelining enabled and needed fixes Tests that needed test- or fixture-level changes to pass under pipelining. All currently passing under PR #23150. **Fixture-level (`src/fixtures/fixtures.ts` + `src/fixtures/setup.ts`)** - New `PIPELINING_SETUP_OPTS` preset exporting `inboxLag=2`, `minTxsPerBlock=0`, `aztecSlotDuration=12s`, `ethereumSlotDuration=4s`, `walletMinFeePadding=PIPELINED_FEE_PADDING` (30x), and `enableProposerPipelining=true`. - `setup.ts` gains a small conditional so the deploy-phase `minTxsPerBlock` override uses `0` instead of `1` under pipelining (otherwise the chain stalls on alternating slots). **Cheat-codes (`src/testing/cheat_codes.ts`)** -- already on `merge-train/spartan` via cherry-pick of #23213. **P2P (`src/services/dummy_service.ts`)** - `notifyOwnCheckpointProposal` now invokes the all-nodes callback synchronously, mirroring libp2p loopback. Without this the in-process e2e sequencer never sees its own proposal and the pipelined parent verification blocks indefinitely. **Sequencer-client** - `sequencer.ts::tryVoteWhenEscapeHatchOpen` -- §6 B5 fix: takes `targetSlot`, signs the voter for `targetSlot`, and delays submission via `sendRequestsAt(getTimestampForSlot(targetSlot))` when pipelining is enabled. Mirrors the existing `tryVoteWhenSyncFails` and `CheckpointProposalJob.execute` patterns. Plus a refactor of `canProposeAt` simulation overrides via `SimulationOverridesBuilder`. - `sequencer-publisher.ts` -- error log on publisher exhaustion now includes the underlying viem error and tried-addresses context. **Per-suite test fixes** - `e2e_lending_contract` -- predictable-time stub, longer hook windows. - `e2e_fees/private_payments` "pays fees for tx that dont run public app logic". - `e2e_blacklist_token_contract/{burn, minting, shielding, transfer_private, transfer_public, unshielding}` -- 6/7 suites re-enabled (`access_control` still skipped, see §5). - `e2e_contract_updates` -- all 4 tests re-enabled (covered by §1 opt-in in this PR). - `e2e_expiration_timestamp` invalidates tests -- L1-only `eth.warp(target, { resetBlockInterval: true })`, no publisher cascade. - `e2e_ordering` -- switched from "latest block" to receipt-block reads; helper renamed to `expectLogsFromBlockToBe(logMessages, fromBlock)`. - `e2e_fees/failures` -- snapshot `provenCheckpointBefore/After`, use `waitForProven` with extended timeout, account for newly-proven checkpoint deltas in reward math, read committed fee headers via `getCommittedProverFee` / `getCommittedBurn`. - `e2e_fees/gas_estimation` -- pad `maxFeesPerGas` via `getPaddedMaxFeesPerGas(aztecNode)` in `beforeEach` to absorb fee-asset price evolution between snapshot and submission. 3/3 passing. - `e2e_crowdfunding_and_claim` "cannot donate after a deadline" -- L1-only `cheatCodes.eth.warp(deadline+1, { resetBlockInterval: true })`. - `e2e_deploy_contract/contract_class_registration` private-ctor variants -- thread `receipt.blockNumber` through `deployFn`, read logs from that specific block instead of "latest". 21/21 passing. - `e2e_state_vars` DelayedPublicMutable -- root cause was slot-duration mismatch (`delay(4)` assumed `aztecSlotDuration=72s` from `DefaultL1ContractsConfig`; fixture forces `12s` under pipelining). Replaced `delay(4)` with a loop that pumps no-op txs until `timestamp >= timestamp_of_change`, and asserted exact equality against `tx.data.constants.anchorBlockHeader.globalVariables.timestamp + newDelay - 1n`. Tight `toEqual`, no widened bound. - `e2e_pending_note_hashes_contract` -- squash helpers use the latest *non-empty* block. - `e2e_expiration_timestamp` -- include-by computation bumped by 2x `aztecSlotDuration`. - `e2e_p2p/*` and `e2e_epochs/*` -- explicit `enableProposerPipelining: true` + `inboxLag: 2` on every test that builds its own config (so behavior is intentional rather than implicit). - `e2e_block_building` "processes txs until hitting timetable" -- replaced legacy `canStartNextBlock` mock + single-deadline timetable with the pipelined sub-slot budget (`blockDurationMs=2000`, `enforceTimeTable=true`, `fakeProcessingDelayPerTxMs=500`). 10 simultaneous txs must span at least 2 distinct blocks; would fail if the proposer reverted to single-block-per-slot or stopped enforcing sub-slot deadlines. - `e2e_block_building` "assembles a block with multiple txs" (x2) -- pre-publish the contract class once and pass `skipClassPublication: true` on each per-tx deploy so the deploys don't all share the same `ContractClassRegistry.publish` nullifier and get RBF-rejected against each other. Also reset `blockDurationMs` in `afterEach` so the multi-block-per-slot state from the previous test doesn't leak. - `e2e_block_building` "publishes two empty blocks" -- `buildCheckpointIfEmpty: true` so the proposer doesn't skip empty sub-slots; retry budget bumped from 10s -> 60s because empty checkpoints land every `aztecSlotDuration` (12s) rather than every legacy block. - `e2e_epochs/epochs_mbps.parallel` "builds multiple blocks per slot with L2 to L1 messages" -- pipelined timing loses one sub-slot to attestation propagation; expectation dropped from `EXPECTED_BLOCKS_PER_CHECKPOINT=3` to `>= 2`, mirroring the sibling MBPS tests. - `e2e_l1_with_wall_time` -- test was explicitly passing `ethereumSlotDuration` from env (=12s), defeating the fixture's pipelining override (=4s). With `aztec=eth=12s`, pipelined timing can't fit propose+attest+publish in one Aztec slot. Removed the explicit `ethereumSlotDuration`; also wrapped `teardown` in `afterEach` so setup failures surface their real error. - `e2e_p2p/add_rollup` re-enabled (entire describe; 1 test, passes in ~9:14 locally). AttestationTimeoutError still fires in some slots, but the bundled-multicall governance-signal preCheck is independent of the propose preCheck -- signals accumulate and reach quorum even when checkpoint proposes fail to attest. - `e2e_pruned_blocks` "can discover and use notes created in both pruned and available blocks" -- restored the explicit `markAsProven` call (as it had pre-#21156) + a 2-block buffer for Anvil's `finalized = latest - 2` heuristic; test re-enabled and passes. - `e2e_sequencer/escape_hatch_vote_only` re-enabled. Source fix at `sequencer.ts::tryVoteWhenEscapeHatchOpen` (see §B5 in PR #23150). Test-side: attach event listeners *after* the warp, explicitly drain trailing in-flight votes before counting. - `e2e_sequencer/gov_proposal.parallel` re-enabled (both tests). Two pipelining-aware adjustments: warp offset bumped to `nextRoundBeginsAtTimestamp - AZTEC_SLOT_DURATION - ETHEREUM_SLOT_DURATION`, and per-tx wait timeouts tuned for two slots of catch-up (proposer + L1 mine). **Bash-level timeout adjustments (`end-to-end/bootstrap.sh`)** -- pipelined sequential dependent txs run at ~2x legacy latency: - simple e2e default: 10m -> 20m - `e2e_block_building`: 25m - `e2e_avm_simulator`: 30m - compose/web3signer: 20m - HA: 30m - `scripts/test_simple.sh` Jest `--testTimeout` 5m -> 10m - ~21 test files: per-file `const TIMEOUT` raised from 100/120/150/180s -> 300s. --- ## Out of scope - **Global default flip**: PR #23150 flipped `enableProposerPipelining=true` everywhere. This PR keeps the default `false` and migrates per-test. The global flip will land in a follow-up. - **§3 opt-outs** (`e2e_l1_publisher` "with attestations" describe, `epoch_cache.test.ts` non-pipelined branch coverage, demo `docker-compose.yml`): no change required while the default is `false`. - **§5 still-skipped tests**: the tests in §5 of PR #23150's categorization (e.g. `e2e_blacklist_token_contract/access_control`, `e2e_publisher_funding_multi`, `e2e_fees/fee_settings`, etc.) remain at `merge-train/spartan` state. - **Base-class fixtures** (`FeesTest`, `BlacklistTokenContractTest`, `CrossChainMessagingTest`, `DeployTest`, `FullProverTest`, `EpochesTest`, P2P fixtures): test files using these get their branch-side changes preserved but are not wired to pipelining via the base class -- those base classes are shared with tests not in this batch and a blanket opt-in would over-migrate. Follow-up PRs will opt them in selectively. Reference: PR #23150 (`palla/kill-non-pipelined-flow`) for full context on the categorization, source-level bugs surfaced (§6 B1-B6), and per-suite investigation notes.
Fixes A-551
Description
Replaces the heuristic finalized block computation (
provenBlock - 2 * epochDuration) with L1 finality.On each archiver sync iteration, we now:
getBlock({ blockTag: 'finalized' })Failures in this step are caught and logged as warnings so they don't disrupt the rest of the sync loop (e.g. if the RPC node can't serve state at the finalized block).
Changes
RollupContract.getProvenCheckpointNumbernow accepts an optional{ blockNumber }to query historical contract stateBlockStorestores alastFinalizedCheckpointsingleton and derivesgetFinalizedL2BlockNumberfrom it instead of the old arithmetic heuristicArchiverL1SynchronizergainsupdateFinalizedCheckpoint(), called every sync iterationKVArchiverDataStoreconstructor no longer takesl1Constants(theepochDurationit was used for is no longer needed)FakeL1Stateupdated to supportblockTag: 'finalized'andgetProvenCheckpointNumberwith a block number, enabling new sync tests