feat: merge-train/spartan by AztecBot · Pull Request #23671 · AztecProtocol/aztec-packages

AztecBot · 2026-05-29T08:25:04Z

BEGIN_COMMIT_OVERRIDE
test(e2e): unskip pipelining related e2e tests (#23642)
fix(archiver): prune blocks without proposed checkpoint by end of build slot (#23606)
test: migrate benchmarks to pipelining setup (#23647)
fix(p2p): fall back to archiver in BLOCK_TXS response validation (#23624)
docs(slashing): align operator and slasher docs with AZIP-7 (#23494)
fix(p2p): do not penalize peers that signal a missing block with Fr.ZERO (#23672)
chore: adjust metrics deployment (#23676)
fix(cheat-codes): warpL2TimeAtLeastBy advances relative to leading clock (#23675)
chore: tighten node pool sizes (#23678)
chore: remove archival nodes (#23630)
chore: merge blob sink duties into RPC node (#23631)
fix: sync avm-transpiler Cargo.lock with noir submodule (#23683)
fix(spartan): set validator lag env vars in tps-scenario (#23684)
fix: make world-state hash queries reorg-aware to close getWorldState race (#23677)
fix: pin noir submodule to next's version on merge-train/spartan (#23690)
fix: ensure image ref is used by bench runner (#23682)
fix(ci): retry aztec-nr nargo dependency clone on transient network flake (#23653)
chore: run one-off jobs on network nodes (#23701)
fix: simulate proposals inside target slot (#23692)
chore: smaller eth-devnet (#23704)
chore: enable testnet autoscaling (#23705)
feat(api)!: redesign node log retrieval API around tag-based queries (#23625)
fix(sequencer): set own proposed checkpoint locally instead of via p2p loopback (#23659)
END_COMMIT_OVERRIDE

## Summary - fix the L1 reorg message tests to wait for pre-reorg message visibility instead of readiness where readiness is not the behavior under test - stabilize the L1 reorg pending-chain prune test by reorging back before the checkpoint publish block - target a future pipelined submission slot in the MBPS prune test before checkpoint publishing is disabled - keep recently unskipped tests listed as flake patterns without skip: true - leave blacklist token contract e2e tests on AUTOMINE_E2E_OPTS and remove stale pipelining migration TODOs ## Verification - yarn lint end-to-end - git diff --check - ANVIL_PORT=8571 e2e_epochs/epochs_l1_reorgs.parallel.test.ts -t 'updates L1 to L2 messages changed due to an L1 reorg' passed locally - ANVIL_PORT=8572 e2e_epochs/epochs_mbps.pipeline.parallel.test.ts -t 'prunes uncheckpointed blocks when proposer fails to deliver' passed locally - Raman: escalated red/green local runs passed for e2e_epochs/epochs_l1_reorgs.parallel.test.ts -t 'handles missed message inserted by an L1 reorg' - Raman: escalated red/green local runs passed for e2e_epochs/epochs_l1_reorgs.parallel.test.ts -t 'prunes blocks from pending chain removed from L1 due to an L1 reorg' - Raman: yarn build passed

…ld slot (#23606) When the previous proposer sent some block proposals but failed to send the corresponding checkpoint proposal, the current proposer would assume there was no proposed checkpoint to build on top of, but would still use the proposed blocks as chain tip. This meant a failed `canPropose` check against the Rollup contract as soon as it started its slot, since the proposed blocks from the previous proposer meant the proposer had a wrong chain tip. To fix, the sequencer is now aware that there may be proposed blocks without the corresponding checkpoints, and it can't start building until that's resolved. Also, the archiver now prunes proposed blocks without a checkpoint when the corresponding _build_ slot is over. --- ## Motivation Under proposer pipelining a node can receive and reexecute the block-only proposals for a checkpoint before (or without ever) receiving the enclosing proposed checkpoint. This leaves the local tip one checkpoint ahead of the checkpointed tip with no proposed checkpoint backing it. A sequencer that then builds the next checkpoint on top of that orphan tip forks the chain off a parent no other node can follow, which was the root cause behind the sentinel CI flake. ## Approach Two complementary defenses. The sequencer's `checkSync` refuses to proceed when the synced block's checkpoint is ahead of the checkpointed tip and no matching proposed checkpoint exists, holding the line during the window before cleanup. The archiver adds a wall-clock orphan prune that, shortly after a block's build slot ends, removes a block-only tip whose checkpoint was never proposed, restoring liveness even while L1 is quiet. ## Changes - **sequencer-client**: `checkSync` rejects syncing onto a proposed block with no matching proposed-checkpoint tip/data, logging a descriptive warning. - **archiver**: new `pruneOrphanProposedBlocks` on the L1 synchronizer, run from `Archiver.sync()` after the inbound queue drains and before L1 sync; prunes after `start(blockSlot) + grace` using the epoch-cache pipelining offset and emits `L2PruneUncheckpointed`. The existing L1-sync prune is preserved (shared prune/emit helper). - **archiver/stdlib/foundation config**: new `orphanProposedBlockPruneGraceSeconds` in `ArchiverSpecificConfig`, archiver config mappings (`ARCHIVER_ORPHAN_PROPOSED_BLOCK_PRUNE_GRACE_SECONDS`), `mapArchiverConfig`, the synchronizer/archiver config types, and a new `EnvVar`. - **aztec-node**: defaults the grace window from `blockDurationMs / 1000` when unset, falling back to `MIN_EXECUTION_TIME`; the archiver factory also defaults to `MIN_EXECUTION_TIME`. - **sequencer-client (tests)**: orphan tip returns `undefined` and warns; matching proposed checkpoint proceeds. - **archiver (tests)**: no prune before grace; prune + event after grace; no prune when a matching proposed checkpoint exists; queued proposed checkpoint is processed before the prune.

## Summary - Run build-block, node RPC, and tx stats benchmarks with `PIPELINING_SETUP_OPTS`. - Run client-flow benchmarks through the Automine setup via `ClientFlowsBenchmark`. - Advance fee-juice bridge setup with explicit debug `mineBlock()` when available so Automine can progress empty blocks. ## Testing - `yarn format end-to-end` - `yarn build` - `yarn lint end-to-end`

) ## Summary `Libp2pService.validateRequestedBlockTxsConsistency` rejects every BLOCK_TXS reqresp response for any block whose proposal is not in the local attestation pool. The responder handler already falls back to the archiver in this case, the validator did not. So any node that doesn't have the proposal locally — but does know the block from the archiver — cannot collect its missing txs, and instead storms its peers until it is rate-limited and disconnected. This PR teaches the validator the same proposal-or-archiver fallback the handler already uses, breaking the storm at its source. ## The defect: validator/handler asymmetry To verify that a peer's BLOCK_TXS response matches the block, the validator needs the canonical tx-hash list for the block. Until this PR it consulted only the attestation pool: ```ts // yarn-project/p2p/src/services/libp2p/libp2p_service.ts (pre-fix) const proposal = await this.mempools.attestationPool.getBlockProposalByArchive(...); if (proposal) { /* check membership/order */ } else { return false; } ``` The responder handler (`p2p/src/services/reqresp/protocols/block_txs/block_txs_handler.ts:40-45`) already serves from either source: ```ts let txHashes = (await attestationPool.getBlockProposalByArchive(...))?.txHashes; if (!txHashes) { txHashes = (await archiver.getBlock({ archive: request.archiveRoot }))?.body.txEffects.map(e => e.txHash); } ``` So peers can produce valid responses for blocks that are only known via the archiver, but the validator at the other end rejects them. ## When this fires Any p2p-enabled node subscribes to `block_proposal` gossip (`libp2p_service.ts:575`) and stores received proposals into its attestation pool (`validateAndStoreBlockProposal` at `libp2p_service.ts:1236`, calling `tryAddBlockProposal` at line 1252). Neither subscription nor storage is gated by `disableValidator` — that flag only controls the validator-client (attestation signing). So in the steady state, a node that was online when a proposal was gossiped does have it locally. The validator's lookup fails whenever the node lacks the proposal in its local attestation pool, yet still needs to collect the block's txs over reqresp. The real-world triggers we've seen and can describe: - **A node joins the mesh late** and misses the proposal gossip for blocks that were proposed before it arrived. This was originally noticed during an e2e run where mesh formation was slower than usual, and it's the scenario the e2e test in this PR reproduces. - A prover-node calling `ProverNode.gatherTxs` (`prover-node/src/prover-node.ts:330`) → `TxProvider.getTxsForBlock` → `TxCollection.collectFastFor({type:'block', ...})` → `BatchTxRequester` for any block whose proposal it doesn't happen to hold: prover restart, gossip drop, mesh churn during the epoch, etc. In every case the prover (or any node) has the mined block in its archiver but no proposal in its attestation pool. Until this PR the validator only consulted the attestation pool, so every otherwise-valid response was rejected. ## The self-inflicted ban-storm In `BatchTxRequester` (`p2p/src/services/reqresp/batch-tx-requester/batch_tx_requester.ts`): ```ts // line 432-438: every response gets rejected because validation has no way to validate it const isValid = await this.p2pService.validateRequestedBlockTxsConsistency(...); if (!isValid) { this.handleFailResponseFromPeer(peerId, ReqRespStatus.INTERNAL_ERROR); return; } // line 461-481: INTERNAL_ERROR correctly does not penalize the peer, but // also does not back off — the dumb worker loop just rotates to the next peer: if (responseStatus === ReqRespStatus.NOT_FOUND || responseStatus === ReqRespStatus.INTERNAL_ERROR) { this.peers.markPeerDumb(peerId); this.txsMetadata.clearPeerData(peerId); return; } ``` The dumb worker loop (`batch_tx_requester.ts:261-304`) has no inter-iteration sleep and ten parallel workers (`dumbParallelWorkerCount: 10`) round-robin the peers. Per-peer in-flight de-duplication caps it at one request in flight per peer, but the steady-state hit-rate per peer easily exceeds the responder's per-peer GCRA cap. The penalty arrives from the **responder** side, on the requester: ```ts // rate_limiter.ts:214-225 if (rateLimitStatus === RateLimitStatus.DeniedPeer) { this.peerScoring.penalizePeer(peerId, PeerErrorSeverity.HighToleranceError); } ``` ``` BLOCK_TXS per-peer cap : 10 req / 1000 ms (rate_limits.ts:55-65) HighToleranceError : -2 score points (peer_scoring.ts:34-36) disconnect threshold : -50 (peer_scoring.ts:57) ban threshold : -100 (peer_scoring.ts:56) ``` With ~1 GCRA denial per second per peer, the prover loses 2 points/sec at each responder. The first responder hits **-50 in ~25 s** and goodbyes the prover via `peer_manager.ts:601-603 pruneUnhealthyPeers` (`GoodByeReason.LOW_SCORE`); **-100 in ~50 s** would ban. ## Fix A four-line change in `yarn-project/p2p/src/services/libp2p/libp2p_service.ts`: fall back to the archiver after the attestation-pool lookup, mirroring the responder handler. If neither source has the block we still return `false`without penalising the peer (it really is unverifiable locally). ```ts const proposal = await this.mempools.attestationPool.getBlockProposalByArchive(...); const blockTxHashes = proposal?.txHashes ?? (await this.archiver.getBlock({ archive: request.archiveRoot }))?.body.txEffects.map(e => e.txHash); if (blockTxHashes) { /* existing membership/order check, against blockTxHashes */ } else { /* unchanged: log warn, return false, no penalty */ } ``` The validator's other checks (archive-root match, bitvector length, no dupes, size bounds, subset-membership, ordering) are unchanged. ## Tests **Unit anchor** — `p2p/src/services/libp2p/libp2p_service.test.ts` - New test: *"should accept when the proposal is missing but the block is known via the archiver"* — verified red before the fix (`Expected: true, Received: false`) and green after. - Existing test renamed and tightened to *"should reject without penalising when the block is unknown (no proposal and not in the archiver)"* — covers the still-correct rejection path. - All 46 tests in the file pass. **End-to-end** — `end-to-end/src/e2e_p2p/late_prover_tx_collection.test.ts` Validators form a mesh and mine a block carrying real txs; a prover joins **after** the block is mined, so it has the block in its archiver but never received the proposal or txs over gossip. The test then drives the exact production path the prover would take to gather txs for proving: ```ts const txCollection = (proverNode as ...).p2pClient.txCollection; const collected = await txCollection.collectFastForBlock(minedBlock, blockTxHashes, { deadline }); expect(collected.map(t => t.getTxHash().toString()).sort()) .toEqual(blockTxHashes.map(h => h.toString()).sort()); ``` - Red on the unfixed source: `collected.length === 0` (validation rejects every response, dumb loop runs until the deadline), assertion fails. - Green with the fix: all of `block.body.txEffects`'s txs are collected.

## Summary Closes [A-970](https://linear.app/aztec-labs/issue/A-970). Refresh the operator-facing slashing-configuration guide and the slasher README to match the AZIP-7 end-state, now that the implementation work for AZIP-7 has landed across the Slashing Post-Alpha Improvements project. Operator docs (\`docs/docs-operate/operators/sequencer-management/slashing-configuration.md\`): - Remove the obsolete \"Valid Epoch Not Proven\" section. \`SLASH_PRUNE_PENALTY\` is gone with it. - Rewrite \"Data Withholding\" for the end-of-slot detection rule and add the matching \`SLASH_DATA_WITHHOLDING_TOLERANCE_SLOTS\` env var. - Update \"Inactivity\" to mention end-of-epoch evaluation (no longer waits for proven) and re-execution-based fault attribution. - Flip the descendant offense section to proposer-fault framing to match the rename in #23468. - Add sections for the new offenses: broadcasted invalid block proposal, broadcasted invalid checkpoint proposal, attesting to an invalid checkpoint proposal, duplicate proposal, duplicate attestation. - Sync the env-vars block and the offense-detection bullet list with the current set of watchers. - Convert touched section headings to sentence case per docs style. Slasher README (\`yarn-project/slasher/README.md\`): - Add a note under \`BROADCASTED_INVALID_CHECKPOINT_PROPOSAL\` to make the AZIP-7 \"submitting block proposal after checkpoint\" mapping explicit. That AZIP offense is detected via the existing invalid-checkpoint watcher (a late block makes the prior checkpoint retroactively invalid) rather than having its own offense type. Stacked on #23468 (the \`ATTESTED_DESCENDANT_OF_INVALID\` → \`PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS\` rename) because the new env var name only exists once that PR lands. ## Test plan - \`cd docs && yarn spellcheck\` (clean) - Visual review of the rewritten \"Slashable offenses\" section against the [AZIP-7 spec](https://github.com/AztecProtocol/governance/blob/main/AZIPs/azip-7-update_slashing.md)

…ERO (#23672) **Stacked on:** `[mr/fix-block-txs-validation-archiver-fallback](https://github.com/AztecProtocol/aztec-packages/pull/23624)` (the archiver-fallback PR). --- ## Summary A peer that legitimately can't find the requested block in its attestation pool or archiver, but matches the requested hashes against its tx pool and sends those txs back, signals "I don't have the block" by setting `archiveRoot = Fr.ZERO` in the response (`block_txs_handler.ts:54-58`). The requester's validation currently treats that response the same as a malicious archive-root mismatch and applies a `MidToleranceError` peer penalty. After enough such responses from the same helpful peer, that peer is disconnected by the requester for behaviour the protocol explicitly documents as legitimate. This PR makes the validator recognise `Fr.ZERO` as the "I don't have the block" signal and stop penalising peers for it. ## What's broken ### Responder side (correct, intentional) `block_txs_handler.ts:54-58` — when the responder lacks the block (no proposal, no archived block) but the request carried full tx hashes, it tries to serve from its pool by hash and signals the "I don't know the block" condition with `archiveRoot = Fr.zero()`: ```ts if (!txHashes && requestedTxsHashes !== undefined) { const responseTxs = (await txPool.getTxsByHash(requestedTxsHashes)).filter(tx => !!tx); const response = new BlockTxsResponse(Fr.zero(), new TxArray(...responseTxs), BitVector.init(0, [])); return response.toBuffer(); } ``` ### Requester side (broken) The validator at `libp2p_service.ts:1525-1530` penalizes any archive-root mismatch with `MidToleranceError` (-10 score points) and throws — *including* `Fr.zero`: ```ts if (!response.archiveRoot.equals(request.archiveRoot)) { this.peerManager.penalizePeer(peerId, PeerErrorSeverity.MidToleranceError); throw new ValidationError(...); } ``` After 5 such responses from the same helpful peer, that peer is at -50 in the requester's score book → disconnected by `pruneUnhealthyPeers` (`peer_manager.ts:601-603`). ### Receiver-side code already documented the correct intent The wrongful penalty contradicts what the receiver-side code explicitly says should happen. `batch_tx_requester.ts:586-600` has `handleArchiveRootMismatch`, whose docstring spells out exactly the semantic we're restoring: ```ts /** * Handles an archive root mismatch between local state and peer response. * * - Response archive is Fr.ZERO (peer pruned proposal, legitimate): marks peer dumb. * - Non-zero archive mismatch (malicious response): penalises + marks dumb. */ private handleArchiveRootMismatch(peerId: PeerId, response: BlockTxsResponse): void { if (!response.archiveRoot.isZero()) { this.peers.penalisePeer(peerId, PeerErrorSeverity.LowToleranceError); } this.peers.markPeerDumb(peerId); this.txsMetadata.clearPeerData(peerId); } ``` But this function is only reached from `decideIfPeerIsSmart` → `handleSuccessResponseFromPeer`, which only runs when validation returns `true`. The validator's first check (archive-root equality) rejects every archive-mismatched response — including `Fr.zero` — so `handleArchiveRootMismatch` is never actually invoked. The "Fr.zero is legitimate" exemption it encodes has been unreachable since the validator's archive-root check was added. The flow today: ``` reqresp response │ ▼ validateRequestedBlockTxsConsistency ├─ Fr.zero → reject + Mid penalty (BUG, contradicts the docstring above) ├─ non-zero ≠ → reject + Mid penalty └─ matches archive root ──► handleSuccessResponseFromPeer └─ decideIfPeerIsSmart └─ hasArchiveRootMismatch? always false (validator filtered them all out) ──► handleArchiveRootMismatch ◄── DEAD CODE ``` So the receiver side already knew Fr.zero should not be penalised; the decision was just being made at the wrong layer. This PR moves it to the validator, where it can actually fire. `handleArchiveRootMismatch` itself remains dead code (separate cleanup candidate, not in this PR). ## Fix Special-case `response.archiveRoot.isZero()` at the top of `validateRequestedBlockTxsConsistency` so the archive-root mismatch path is bypassed for `Fr.zero` responses. We still return `false` (the txs are dropped because we can't verify membership/order without the block) but no peer penalty is applied — matching the intent of the `Fr.zero` exemption in `batch_tx_requester.ts:593-600`. ```ts // libp2p_service.ts (inside validateRequestedBlockTxsConsistency) if (response.archiveRoot.isZero()) { this.logger.debug(`Peer ${peerId.toString()} signalled missing block with Fr.zero archive root`); return false; } if (!response.archiveRoot.equals(request.archiveRoot)) { this.peerManager.penalizePeer(peerId, PeerErrorSeverity.MidToleranceError); ... } ``` The validator's other checks are unchanged. The early return prevents the bitvector-length and `maxReturnable` checks downstream from firing on a zero-length bitvector response, which would otherwise also wrongly penalise the peer. ## Tests **Unit** — `p2p/src/services/libp2p/libp2p_service.test.ts` A new test in the `validateRequestedBlockTxsConsistency` describe block: *"should not penalize a peer that signals lacking the block with Fr.ZERO archive root"*. Constructs a response with `archiveRoot = Fr.ZERO` and asserts that `peerManager.penalizePeer` is not called. Verified red before the fix, green after. **Integration** — `p2p/src/client/test/p2p_client.integration_block_txs.test.ts` A new test in the `p2p client integration block txs protocol` describe block: *"requester does not penalize peer that returns Fr.zero (peer lacks proposal but matched by hash)"*. Drives a real reqresp BLOCK_TXS request over libp2p between two clients, lets the responder hit the `Fr.zero` branch in `block_txs_handler`, then runs the response through the requester's real `validateRequestedBlockTxsConsistency` and asserts the requester's `peerManager.penalizePeer` is not called with `MidToleranceError` or `LowToleranceError`. Verified red before the fix, green after.

.

…ock (#23675) ## Problem CI on `merge-train/spartan` failed in `yarn-project/end-to-end/src/composed/e2e_cheat_codes.test.ts` ([log](http://ci.aztec-labs.com/1780045078587982) → test-engine `358cb51c378c6913` → `4baba53d3b9feb67`): ``` e2e_cheat_codes › warpL2TimeAtLeastBy advances time by at least the duration expect(received).toBeGreaterThanOrEqual(expected) Expected: >= 1780048759 (timestampBefore_L2 + 100) Received: 1780048731 (advanced only ~72s) ``` ## Root cause `CheatCodes.warpL2TimeAtLeastBy(duration)` computed its target as `eth.lastBlockTimestamp() (L1) + duration`, but its documented contract is that the **L2** timestamp advances by at least `duration`, and the test measures advancement against the latest **L2 block** timestamp. In the composed test a live sequencer mines L2 blocks at slot boundaries that can run ahead of anvil's L1 clock. When the latest L2 block timestamp leads L1, adding `duration` to L1 produces a target below `latestL2 + duration`, so the resulting block advances L2 time by less than `duration` and the assertion fails. This is a latent correctness bug in the helper that surfaces non-deterministically depending on slot/L1 alignment. ## Fix Anchor the target to whichever clock leads — `max(currentL1Timestamp, latestL2BlockTimestamp) + duration` — before delegating to `warpL2TimeAtLeastTo`. This guarantees the post-warp L2 block is at least `duration` ahead of the current one, while remaining a strict superset of the old behaviour (it never advances by less than before), so other callers (`e2e_expiration_timestamp`, `e2e_contract_updates`, `blacklist_token_contract`, `e2e_automine_smoke`, `lending_simulator`) are unaffected. Single-file change in `yarn-project/aztec/src/testing/cheat_codes.ts`. --- *Created by [claudebox](https://claudebox.work/v2/sessions/368b5e8b3cef0969) · group: `slackbot`*

… race

.

Fix A-1116

Fix A-1117

## Problem CI on `merge-train/spartan` is failing in the `avm-transpiler-native` build step ([log](http://ci.aztec-labs.com/1780052027304932)): ``` error: the lock file /home/aztec-dev/aztec-packages/avm-transpiler/Cargo.lock needs to be updated but --locked was passed to prevent this ``` `cargo build --release --locked --bin avm-transpiler` rejects the stale lock file. ## Root cause The `noir/noir-repo` submodule was bumped on this branch and its `acvm-repo` crates (`acir`, `acir_field`, `brillig`) gained a new path dependency, `msgpack_tagged` (+ `msgpack_tagged_derive`). `avm-transpiler/Cargo.lock` was not regenerated, so it no longer matches `Cargo.toml` and the `--locked` CI build fails. ## Fix Regenerated `avm-transpiler/Cargo.lock` with a plain (non-`--locked`) build so only the required entries change — no bulk update. The diff adds `msgpack_tagged`/`msgpack_tagged_derive` to the relevant dependency lists plus the transitive deps they introduce (`serde_bytes`, `bs58`, `tinyvec`, and a bump of `darling`/`serde_with`). ## Verification - Reproduced the failure: `cargo build --release --locked --bin avm-transpiler` → lock-file error. - After the fix: `cargo build --release --locked --bin avm-transpiler` succeeds. - `./bootstrap.sh build_native` in `avm-transpiler/` (the exact failing CI step) completes cleanly. Only `avm-transpiler/Cargo.lock` is changed. --- *Created by [claudebox](https://claudebox.work/v2/sessions/d9f97447b0ab23ed) · group: `slackbot`*

- Replace stale `AZTEC_LAG_IN_EPOCHS` in `tps-scenario.env` with `AZTEC_LAG_IN_EPOCHS_FOR_VALIDATOR_SET` and `AZTEC_LAG_IN_EPOCHS_FOR_RANDAO` (value 1 each). - Fixes immediate failure in nightly Spartan `wait-bench-l2-block` ([run #184](https://github.com/AztecProtocol/aztec-packages/actions/runs/26627597396/job/78472605259)). Co-authored-by: PhilWindle <60546371+PhilWindle@users.noreply.github.com>

… race (#23677) ## Problem `AztecNodeService.getWorldState({ hash })` can intermittently throw ``` Block hash 0x.. not found in world state at block number N (world state has no hash at that index ...). If the node API has been queried with anchor block hash possibly a reorg has occurred. ``` even when no reorg happened. This surfaced as a flake in oxide / `e2e_frozen_notes_refund` e2e tests proving a tx that reads existing notes (private-kernel reset resolving a read request anchored on an early block). ## Root cause `getWorldState` did its world-state sync via `#syncWorldState()`, which only synced to the archiver's *latest height* and ignored any requested block hash. It then: 1. resolved the requested hash to a block number against the archiver, and 2. took `getSnapshot(N)` and read the `ARCHIVE` leaf at index `N` to double-check the hash. The synchronizer reports `syncedTo >= N` as soon as the height advances, but the archive-tree commit for block `N` may not yet be visible in the snapshot view. A snapshot taken in that window has populated note/nullifier trees but a half-written archive tree, so `getLeafValue(ARCHIVE, N)` returns `undefined` and the double-check throws. The sync-status height and the archive-tree write were not barriered against each other. ## Fix Make the hash-anchored path reorg-aware, as the existing synchronizer plumbing already supports (`syncImmediate(targetBlockNumber, blockHash)`): - When the request carries a block hash, resolve it against the archiver up front (fails fast with the existing clear reorg error if the hash is unknown), then drive the sync to that exact `(blockNumber, hash)`. - `WorldStateSynchronizer.syncImmediate` reads the *committed* `ARCHIVE` leaf for that block; if it is missing or mismatched it forces a `blockStream.sync()` and re-checks, only throwing on a genuine reorg. This barriers on the archive-tree commit actually landing, closing the race window, and turns real reorgs into a clear `block_hash_mismatch` instead of a confusing snapshot error. Block-number / tag queries are unchanged: they still sync to latest height with no hash. The existing snapshot double-check is kept as a backstop. ## Tests Added unit tests in `server.test.ts` asserting that a hash-anchored query forwards the resolved `(blockNumber, hash)` to `syncImmediate`, and that block-number queries still sync to latest height with no hash. Full `yarn-project` install/bootstrap (which builds the barretenberg/noir portal packages) was not available in this session, so the suite was not executed here — worth a CI run before merge. --- *Created by [claudebox](https://claudebox.work/v2/sessions/6bca66dc9c802402) · group: `slackbot`*

) ## Problem CI on `merge-train/spartan` is failing in the `aztec-nr` step ([log](http://ci.aztec-labs.com/1780053790683323)) with `BoundedVec::from_parts_unchecked` deprecation errors under `nargo check --deny-warnings`. The train's noir submodule had diverged from `next`: | Branch | noir pin | date | `from_parts_unchecked` deprecated? | |---|---|---|---| | `next` | `f1a4575` | May 11 | no | | `merge-train/spartan` | `4d039268` | May 28 | **yes** | The newer pin (`4d039268`) was pulled onto the train by **PR #23675** ("fix(cheat-codes): warpL2TimeAtLeastBy…"), which bumped `noir/noir-repo` from the May-11 pin to a May-28 nightly. That nightly added `#[deprecated]` to `BoundedVec::from_parts_unchecked`, and since aztec-nr builds with `--deny-warnings`, the two remaining call sites became hard errors. The two noir commits are on divergent lines (neither is an ancestor of the other), so the train was simply ahead of `next` on noir. ## Fix Pin the train's noir back to exactly what `next` uses. Only two files differed from `next`: - `noir/noir-repo` → `f1a4575` (next's pin) - `avm-transpiler/Cargo.lock` → next's version (it had been re-synced to the May-28 noir by #23683; restored to match the May-11 pin) This restores parity with `next` and removes the deprecated API entirely, so no aztec-nr source change is needed. ## Verification Built `nargo` from the `f1a4575` pin and ran the failing check against the **unmodified** aztec-nr source: - `nargo check --deny-warnings` → exit 0 (the deprecation attribute is absent in `f1a4575`). ## Note This is an alternative to #23687, which fixed the same failure by patching the two aztec-nr call sites to use `from_parts` against the newer noir. Pick one: this PR keeps the train aligned with `next`'s noir; #23687 keeps the newer noir and updates the source. Closing whichever isn't chosen. --- *Created by [claudebox](https://claudebox.work/v2/sessions/2f980b2000011f91) · group: `slackbot`*

Using the nightly tag as `source_ref` to ensure the code that the runner uses is the same as what's deployed on GKE

…lake (#23653) ## Why The `merge-train/spartan` train PR (#23580) was dequeued from the merge queue. The merge-queue CI3 run ([run 26608568295](https://github.com/AztecProtocol/aztec-packages/actions/runs/26608568295)) failed in the `x2-full amd64 ci-full-no-test-cache` grind during the **aztec-nr warnings check**, after only ~11s: ``` Checking aztec-nr for warnings... Cloning into '.../noir-lang/poseidon/v0.3.0'... Cloning into '.../noir-lang/sha256/v0.3.0'... fatal: unable to access 'https://github.com/noir-lang/sha256/': Could not resolve host: github.com Cannot read file .../noir-lang/sha256/v0.3.0/Nargo.toml - does it exist? make: *** [Makefile:303: aztec-nr] Error 1 ``` A transient DNS/network flake on the runner — not a code defect. `aztec-nr/aztec/Nargo.toml` declares external git dependencies (`noir-lang/sha256`, `noir-lang/poseidon`, pinned at `v0.3.0`) which `nargo check` resolves by cloning from `github.com` on a cold cache. When the runner momentarily can't resolve `github.com`, the clone fails and dequeues the whole train. ## What A blanket `retry` around `nargo check` would also re-run on genuine check failures (type errors, denied warnings) — wasting CI time and masking intent. So instead: - **`ci3/retry` gains a `-p <regex>` option.** It captures the command's combined output and only retries when a failure matches the regex; any non-matching failure exits immediately with the original code. Without `-p`, behavior is unchanged (the heavily-used default path is untouched). `pipefail` ensures the wrapped command's exit code (not `tee`'s) is what's checked, and `tee` finishes before inspection so the captured output is complete. - **`aztec-nr/bootstrap.sh`** wraps its two network-touching nargo calls (`check`, `doc --check`) with `retry -p "<git transport errors>"`, matching only `Could not resolve host`, `unable to access`, `Connection timed out/refused`, `Failed to connect`, `TLS connect error`, `early EOF`, `RPC failed`. These never overlap with nargo's `error:`/`warning:` output, so a genuine check failure still fails on the first attempt. `nargo` has no standalone dependency-install/fetch command (its subcommands are `check, compile, dap, debug, doc, execute, expand, export, fmt, fuzz, info, init, interpret, lsp, new, test`); resolution only happens inside `check`/`compile`/`test`, so the regex-gated retry is the workable option of the two suggested. ## Verification - `bash -n` on all three files; `ci3/tests/retry_test` (new, auto-discovered by the ci3 test runner) passes all 6 cases: - default mode retries a transient failure then succeeds / gives up after 3 attempts - pattern mode retries a matching network failure then succeeds - **pattern mode fails fast (1 attempt) on a non-matching genuine error** ← the behavior requested - pattern mode gives up after 3 attempts on a persistent matching failure - `RETRY_DISABLED` runs the command exactly once The full `./bootstrap.sh ci` run is the same orchestrated remote-EC2 CI that failed here and isn't reproducible on a dev host; the transient DNS failure also can't be reproduced where DNS works. Verification is therefore at the retry/wrapper level, which is exactly what this change touches.

.

## Summary - Use the last L1 slot timestamp inside the target L2 slot for proposal/header simulations. - Keep bundle simulation and pre-broadcast header validation on the same timestamp rule to avoid `eth_simulateV1` timestamp-order failures. - Add regression coverage for both simulation paths.

Fix A-1122

.

…23625) ## Motivation The node exposed four log-retrieval methods with three filter shapes and two return shapes, while the private index was only tag-keyed — so a `(tag, narrow block range)` query loaded the entire per-tag history into memory. Pagination was a single global `page` counter shared across all tags, and the public path was split between a `LogFilter` method and a tag-based method. v5 is already a breaking release, so this collapses everything to a single, fast, tag-based surface with no back-compat. Fixes A-1111 Fixes A-1031 ## Approach Two methods, `getPrivateLogsByTags(query)` and `getPublicLogsByTags(query)`, replace the four old ones; `getContractClassLogs` and `getPublicLogs(LogFilter)` are gone. The archiver stores each log under a fixed-width composite hex-string key `[contractHex] - tagHex - blockHex8 - txIndexHex8 - logIndexHex8` in an LMDB map, so every supported filter (tag, block range, txHash, per-tag `afterLog` cursor, `referenceBlock` reorg cap) reduces to a single ordered range scan. Note hashes and nullifiers are never copied into the log index — they're fetched on demand from the block store via a partial deserializer that reads only the relevant prefix of the stored `IndexedTxEffect`. `ARCHIVER_DB_VERSION` bumps 6 → 7, so the archiver self-wipes and re-syncs from L1 on first start. ## API Two node methods replace the previous four. Each returns one inner array per element of `query.tags`, in input order; an empty inner array means that tag matched nothing. ```ts getPrivateLogsByTags(query: PrivateLogsQuery): Promise<LogResult[][]>; getPublicLogsByTags(query: PublicLogsQuery): Promise<LogResult[][]>; ``` **Input** ```ts // Filters shared by both queries. type LogsQueryBase = { fromBlock?: BlockNumber; // inclusive lower bound toBlock?: BlockNumber; // exclusive upper bound txHash?: TxHash; // restrict to one tx; mutually exclusive with fromBlock/toBlock referenceBlock?: BlockHash; // reorg anchor: throws if that block is no longer present includeEffects?: boolean; // also attach each log's tx noteHashes + nullifiers limitPerTag?: number; // page size, 1..MAX_LOGS_PER_TAG (default & max = 20) }; // A tag to query, optionally resuming strictly after a previously-seen log. // The bare `T` form starts from the beginning. type TagQuery<T> = T | { tag: T; afterLog?: LogCursor }; type PrivateLogsQuery = LogsQueryBase & { tags: TagQuery<SiloedTag>[]; // 1..MAX_RPC_LEN (100) entries }; type PublicLogsQuery = LogsQueryBase & { contractAddress: AztecAddress; // required for public queries tags: TagQuery<Tag>[]; // 1..MAX_RPC_LEN (100) entries }; ``` **Output** ```ts type LogResult<Opts = { includeEffects?: boolean }> = { logData: Fr[]; // log fields; the tag is logData[0] blockNumber: BlockNumber; blockHash: BlockHash; blockTimestamp: UInt64; txHash: TxHash; txIndexWithinBlock: number; // 0-based index of the tx within its block logIndexWithinTx: number; // 0-based index of the log within its tx } & (Opts extends { includeEffects: true } ? { noteHashes: Fr[]; nullifiers: Fr[] } // present only when includeEffects: true : {}); // Opaque per-tag pagination cursor. // String form: `<blockNumber>-<txIndexWithinBlock>-<logIndexWithinTx>`. class LogCursor { blockNumber: BlockNumber; txIndexWithinBlock: number; logIndexWithinTx: number; static fromLog(log: LogResult): LogCursor; } ``` Pagination is per-tag: feed a tag's last `LogResult` back as the next query's `afterLog` (`{ tag, afterLog: LogCursor.fromLog(last) }`). A tag is exhausted once it returns fewer than `limitPerTag` results. The stdlib helpers `queryAllPrivateLogsByTags` / `queryAllPublicLogsByTags` drive this loop and return the fully-drained results. ## Changes - **stdlib**: new `LogResult`, `LogCursor`, `PrivateLogsQuery` / `PublicLogsQuery` types with zod schemas; `txHash` ⊕ `fromBlock`/`toBlock` enforced via `.refine` (but `txHash` + `afterLog` is allowed, to paginate within a tx's logs). `L2LogsSource` / `AztecNode` / `Archiver` interfaces and schemas reduced to the two new methods. Deleted `LogFilter`, `LogId`, `TxScopedL2Log`, `ExtendedPublicLog`, `ExtendedContractClassLog`, `GetPublicLogsResponse`, `GetContractClassLogsResponse`, and the dead `Tx.getPublicLogs(logsSource)`. - **archiver**: full `LogStore` rewrite — two hex-string-keyed `AztecAsyncMap` primary maps (keys are fixed-width zero-padded lowercase hex, so `ordered-binary`'s string ordering matches the canonical `(contract, tag, block, txIndex, logIndex)` tuple and every filter is a single ordered range scan) plus two `blockNumber → string[]` secondary indices driving `deleteLogs` (replaces the buggy per-block tag-union list). All reads, including the `referenceBlock` existence check, run inside one `db.transactionAsync` across `BlockStore` + `LogStore`; a `referenceBlock` equal to the (synthetic, unindexed) genesis block hash resolves to the genesis block number rather than throwing. New `BlockStore.getNoteHashesAndNullifiers(txHashes)` is a batched partial deserializer for `includeEffects`, and `getTxLocation` reads only the 40-byte header instead of the full `TxEffect`. Contract-class-log storage removed entirely. `ARCHIVER_DB_VERSION` 6 → 7. `OutOfOrderLogInsertionError` and the `ARCHIVER_MAX_LOGS` env var dropped. - **aztec-node**: four RPC handlers collapsed to two thin forwarders; `referenceBlock` resolution moved into the store so it shares the transaction. - **pxe**: `getAllPages` rewritten from a global `page` counter to per-tag `afterLog` cursors — each round re-queries only tags that returned a full page, and tags drop out as soon as they return a short page. `fromBlock` / `toBlock` are pushed down into the node, eliminating the in-memory `#extractLogs` range filter. - **aztec.js**: `getPublicEvents` migrated to the new query shape; `PublicEventFilter.contractAddress` is now required; `EventFilterBase.afterLog: LogId → LogCursor`. - **cli**: `get-logs` requires `--contract-address` and `--tag`; `--after-log` parses a `LogCursor` string `<blockNumber>-<txIndexWithinBlock>-<logIndexWithinTx>`. - **end-to-end**: `e2e_ordering` rewritten to read `getBlock().body.txEffects[*].publicLogs` directly (the new API drops the tag-less, contract-less query shape). - **docs**: migration notes for client consumers; operator changelog covering the DB version bump and one-time resync.

…p loopback (#23659) ## Motivation The proposer relied on looping its own checkpoint proposal back through the p2p receive path to advance its local proposed-checkpoint tip before propagating. Under `broadcastInvalidBlockProposal` the broadcast checkpoint archive is deliberately corrupted, so the loopback handler's archive-based block lookup (`getBlockData({ archive })`) found nothing and retried until the next slot. By the time the proposer returned from broadcast, propagation had slipped past the p2p validator's stale window — producing intermittent failures (e.g. peers rejecting a late slot proposal). ## Approach The proposer's optimistic proposed-checkpoint tip is the proposer's own local state, so it is now set directly in the sequencer's checkpoint proposal job rather than via a p2p loopback. The job adds the proposed checkpoint to the archiver from local checkpoint data (block numbers and counts, never the possibly-corrupted broadcast archive) immediately before gossiping, failing closed if the local insert fails. Because every block is already added to the archiver's FIFO queue (and awaited) during block building, the checkpoint insert needs no retry. The `notifyOwnCheckpointProposal` loopback is removed entirely, so the path is identical whether p2p is enabled or not. ## Changes - **stdlib**: New `ProposedCheckpointSink` interface alongside `L2BlockSink`. - **sequencer-client**: `CheckpointProposalJob` now pushes the proposed checkpoint to the archiver from local data before broadcast, gated on proposer pipelining and skipped when block-push is disabled (`skipPushProposedBlocksToArchiver`, fisherman mode); widened the sequencer/client `l2BlockSource` types to `ProposedCheckpointSink`. - **p2p**: Removed `notifyOwnCheckpointProposal` from the `P2PService` interface, the libp2p and dummy services, and the `P2PClient.broadcastCheckpointProposal` call site (own proposals are still stored in the attestation pool before propagation). - **validator-client**: The all-nodes own-proposal branch now skips validation and returns; removed the now-dead `setProposedCheckpointFromBlocks` and narrowed the archiver `Pick`. - **tests**: Added job tests (push-from-local-data and order-before-gossip, abort-on-push-failure, no-push-when-pipelining-disabled, fisherman) and a proposal_handler own-proposal test; removed the obsolete libp2p loopback test and the e2e slash-test stub; widened affected mock types.

AztecBot · 2026-05-29T19:08:14Z

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/dbf33a55aa777b80�dbf33a55aa777b808;;�): yarn-project/end-to-end/scripts/run_test.sh ha src/composed/ha/e2e_ha_full.test.ts (296s) (code: 0)

AztecBot added ci-no-squash ci-full-no-test-cache labels May 29, 2026

spalladino and others added 17 commits May 29, 2026 09:49

chore: adjust metrics deployment (#23676)

581acd0

.

fix: make world-state hash queries reorg-aware to close getWorldState…

35093ac

… race

chore: tighten node pool sizes (#23678)

00265d2

.

chore: remove archival nodes (#23630)

22b735f

Fix A-1116

chore: merge blob sink duties into RPC node (#23631)

8476542

Fix A-1117

update PR #23677

3345fea

Merge branch 'next' into cb/world-state-hash-query-reorg-aware

05b29a7

PhilWindle enabled auto-merge May 29, 2026 12:24

fix: ensure image ref is used by bench runner (#23682)

c7703c1

Using the nightly tag as `source_ref` to ensure the code that the runner uses is the same as what's deployed on GKE

spypsy requested a review from charlielye as a code owner May 29, 2026 13:15

PhilWindle approved these changes May 29, 2026

View reviewed changes

PhilWindle added this pull request to the merge queue May 29, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026

spalladino requested a review from nventuro as a code owner May 29, 2026 14:34

chore: run one-off jobs on network nodes (#23701)

e8eb17d

.

PhilWindle enabled auto-merge May 29, 2026 14:50

spypsy and others added 4 commits May 29, 2026 12:00

chore: smaller eth-devnet (#23704)

00bc25e

Fix A-1122

chore: enable testnet autoscaling (#23705)

7f1e9f2

.

spalladino requested a review from a team as a code owner May 29, 2026 16:27

PhilWindle added this pull request to the merge queue May 29, 2026

Merged via the queue into next with commit 992f8ea May 29, 2026
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: merge-train/spartan#23671

feat: merge-train/spartan#23671
PhilWindle merged 26 commits into
nextfrom
merge-train/spartan

AztecBot commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

AztecBot commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

AztecBot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

AztecBot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Flakey Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

AztecBot commented May 29, 2026 •

edited

Loading

AztecBot commented May 29, 2026 •

edited

Loading