Skip to content

feat: merge-train/spartan#23671

Merged
PhilWindle merged 26 commits into
nextfrom
merge-train/spartan
May 29, 2026
Merged

feat: merge-train/spartan#23671
PhilWindle merged 26 commits into
nextfrom
merge-train/spartan

Conversation

@AztecBot

@AztecBot AztecBot commented May 29, 2026

Copy link
Copy Markdown
Collaborator

BEGIN_COMMIT_OVERRIDE
test(e2e): unskip pipelining related e2e tests (#23642)
fix(archiver): prune blocks without proposed checkpoint by end of build slot (#23606)
test: migrate benchmarks to pipelining setup (#23647)
fix(p2p): fall back to archiver in BLOCK_TXS response validation (#23624)
docs(slashing): align operator and slasher docs with AZIP-7 (#23494)
fix(p2p): do not penalize peers that signal a missing block with Fr.ZERO (#23672)
chore: adjust metrics deployment (#23676)
fix(cheat-codes): warpL2TimeAtLeastBy advances relative to leading clock (#23675)
chore: tighten node pool sizes (#23678)
chore: remove archival nodes (#23630)
chore: merge blob sink duties into RPC node (#23631)
fix: sync avm-transpiler Cargo.lock with noir submodule (#23683)
fix(spartan): set validator lag env vars in tps-scenario (#23684)
fix: make world-state hash queries reorg-aware to close getWorldState race (#23677)
fix: pin noir submodule to next's version on merge-train/spartan (#23690)
fix: ensure image ref is used by bench runner (#23682)
fix(ci): retry aztec-nr nargo dependency clone on transient network flake (#23653)
chore: run one-off jobs on network nodes (#23701)
fix: simulate proposals inside target slot (#23692)
chore: smaller eth-devnet (#23704)
chore: enable testnet autoscaling (#23705)
feat(api)!: redesign node log retrieval API around tag-based queries (#23625)
fix(sequencer): set own proposed checkpoint locally instead of via p2p loopback (#23659)
END_COMMIT_OVERRIDE

## Summary
- fix the L1 reorg message tests to wait for pre-reorg message
visibility instead of readiness where readiness is not the behavior
under test
- stabilize the L1 reorg pending-chain prune test by reorging back
before the checkpoint publish block
- target a future pipelined submission slot in the MBPS prune test
before checkpoint publishing is disabled
- keep recently unskipped tests listed as flake patterns without skip:
true
- leave blacklist token contract e2e tests on AUTOMINE_E2E_OPTS and
remove stale pipelining migration TODOs

## Verification
- yarn lint end-to-end
- git diff --check
- ANVIL_PORT=8571 e2e_epochs/epochs_l1_reorgs.parallel.test.ts -t
'updates L1 to L2 messages changed due to an L1 reorg' passed locally
- ANVIL_PORT=8572 e2e_epochs/epochs_mbps.pipeline.parallel.test.ts -t
'prunes uncheckpointed blocks when proposer fails to deliver' passed
locally
- Raman: escalated red/green local runs passed for
e2e_epochs/epochs_l1_reorgs.parallel.test.ts -t 'handles missed message
inserted by an L1 reorg'
- Raman: escalated red/green local runs passed for
e2e_epochs/epochs_l1_reorgs.parallel.test.ts -t 'prunes blocks from
pending chain removed from L1 due to an L1 reorg'
- Raman: yarn build passed
spalladino and others added 17 commits May 29, 2026 09:49
…ld slot (#23606)

When the previous proposer sent some block proposals but failed to send
the corresponding checkpoint proposal, the current proposer would assume
there was no proposed checkpoint to build on top of, but would still use
the proposed blocks as chain tip. This meant a failed `canPropose` check
against the Rollup contract as soon as it started its slot, since the
proposed blocks from the previous proposer meant the proposer had a
wrong chain tip.

To fix, the sequencer is now aware that there may be proposed blocks
without the corresponding checkpoints, and it can't start building until
that's resolved. Also, the archiver now prunes proposed blocks without a
checkpoint when the corresponding _build_ slot is over.

---

## Motivation

Under proposer pipelining a node can receive and reexecute the
block-only proposals for a checkpoint before (or without ever) receiving
the enclosing proposed checkpoint. This leaves the local tip one
checkpoint ahead of the checkpointed tip with no proposed checkpoint
backing it. A sequencer that then builds the next checkpoint on top of
that orphan tip forks the chain off a parent no other node can follow,
which was the root cause behind the sentinel CI flake.

## Approach

Two complementary defenses. The sequencer's `checkSync` refuses to
proceed when the synced block's checkpoint is ahead of the checkpointed
tip and no matching proposed checkpoint exists, holding the line during
the window before cleanup. The archiver adds a wall-clock orphan prune
that, shortly after a block's build slot ends, removes a block-only tip
whose checkpoint was never proposed, restoring liveness even while L1 is
quiet.

## Changes

- **sequencer-client**: `checkSync` rejects syncing onto a proposed
block with no matching proposed-checkpoint tip/data, logging a
descriptive warning.
- **archiver**: new `pruneOrphanProposedBlocks` on the L1 synchronizer,
run from `Archiver.sync()` after the inbound queue drains and before L1
sync; prunes after `start(blockSlot) + grace` using the epoch-cache
pipelining offset and emits `L2PruneUncheckpointed`. The existing
L1-sync prune is preserved (shared prune/emit helper).
- **archiver/stdlib/foundation config**: new
`orphanProposedBlockPruneGraceSeconds` in `ArchiverSpecificConfig`,
archiver config mappings
(`ARCHIVER_ORPHAN_PROPOSED_BLOCK_PRUNE_GRACE_SECONDS`),
`mapArchiverConfig`, the synchronizer/archiver config types, and a new
`EnvVar`.
- **aztec-node**: defaults the grace window from `blockDurationMs /
1000` when unset, falling back to `MIN_EXECUTION_TIME`; the archiver
factory also defaults to `MIN_EXECUTION_TIME`.
- **sequencer-client (tests)**: orphan tip returns `undefined` and
warns; matching proposed checkpoint proceeds.
- **archiver (tests)**: no prune before grace; prune + event after
grace; no prune when a matching proposed checkpoint exists; queued
proposed checkpoint is processed before the prune.
## Summary
- Run build-block, node RPC, and tx stats benchmarks with
`PIPELINING_SETUP_OPTS`.
- Run client-flow benchmarks through the Automine setup via
`ClientFlowsBenchmark`.
- Advance fee-juice bridge setup with explicit debug `mineBlock()` when
available so Automine can progress empty blocks.

## Testing
- `yarn format end-to-end`
- `yarn build`
- `yarn lint end-to-end`
)

## Summary

`Libp2pService.validateRequestedBlockTxsConsistency` rejects every
BLOCK_TXS reqresp response for any block whose proposal is not in the
local attestation pool. The responder handler already falls back to the
archiver in this case, the validator did not. So any node that doesn't
have the proposal locally — but does know the block from the archiver —
cannot collect its missing txs, and instead storms its peers until it is
rate-limited and disconnected.

This PR teaches the validator the same proposal-or-archiver fallback the
handler already uses, breaking the storm at its source.

## The defect: validator/handler asymmetry

To verify that a peer's BLOCK_TXS response matches the block, the
validator needs the canonical tx-hash list for the block. Until this PR
it consulted only the attestation pool:

```ts
// yarn-project/p2p/src/services/libp2p/libp2p_service.ts (pre-fix)
const proposal = await this.mempools.attestationPool.getBlockProposalByArchive(...);
if (proposal) { /* check membership/order */ } else { return false; }
```

The responder handler
(`p2p/src/services/reqresp/protocols/block_txs/block_txs_handler.ts:40-45`)
already serves from either source:

```ts
let txHashes = (await attestationPool.getBlockProposalByArchive(...))?.txHashes;
if (!txHashes) {
  txHashes = (await archiver.getBlock({ archive: request.archiveRoot }))?.body.txEffects.map(e => e.txHash);
}
```

So peers can produce valid responses for blocks that are only known via
the archiver, but the validator at the other end rejects them.

## When this fires

Any p2p-enabled node subscribes to `block_proposal` gossip
(`libp2p_service.ts:575`) and stores received proposals into its
attestation pool (`validateAndStoreBlockProposal` at
`libp2p_service.ts:1236`, calling `tryAddBlockProposal` at line 1252).
Neither subscription nor storage is gated by `disableValidator` — that
flag only controls the validator-client (attestation signing). So in the
steady state, a node that was online when a proposal was gossiped does
have it locally.

The validator's lookup fails whenever the node lacks the proposal in its
local attestation pool, yet still needs to collect the block's txs over
reqresp. The real-world triggers we've seen and can describe:

- **A node joins the mesh late** and misses the proposal gossip for
blocks that were proposed before it arrived. This was originally noticed
during an e2e run where mesh formation was slower than usual, and it's
the scenario the e2e test in this PR reproduces.
- A prover-node calling `ProverNode.gatherTxs`
(`prover-node/src/prover-node.ts:330`) → `TxProvider.getTxsForBlock` →
`TxCollection.collectFastFor({type:'block', ...})` → `BatchTxRequester`
for any block whose proposal it doesn't happen to hold: prover restart,
gossip drop, mesh churn during the epoch, etc.

In every case the prover (or any node) has the mined block in its
archiver but no proposal in its attestation pool. Until this PR the
validator only consulted the attestation pool, so every otherwise-valid
response was rejected.

## The self-inflicted ban-storm

In `BatchTxRequester`
(`p2p/src/services/reqresp/batch-tx-requester/batch_tx_requester.ts`):

```ts
// line 432-438: every response gets rejected because validation has no way to validate it
const isValid = await this.p2pService.validateRequestedBlockTxsConsistency(...);
if (!isValid) {
  this.handleFailResponseFromPeer(peerId, ReqRespStatus.INTERNAL_ERROR);
  return;
}

// line 461-481: INTERNAL_ERROR correctly does not penalize the peer, but
// also does not back off — the dumb worker loop just rotates to the next peer:
if (responseStatus === ReqRespStatus.NOT_FOUND || responseStatus === ReqRespStatus.INTERNAL_ERROR) {
  this.peers.markPeerDumb(peerId);
  this.txsMetadata.clearPeerData(peerId);
  return;
}
```

The dumb worker loop (`batch_tx_requester.ts:261-304`) has no
inter-iteration sleep and ten parallel workers
(`dumbParallelWorkerCount: 10`) round-robin the peers. Per-peer
in-flight de-duplication caps it at one request in flight per peer, but
the steady-state hit-rate per peer easily exceeds the responder's
per-peer GCRA cap.

The penalty arrives from the **responder** side, on the requester:

```ts
// rate_limiter.ts:214-225
if (rateLimitStatus === RateLimitStatus.DeniedPeer) {
  this.peerScoring.penalizePeer(peerId, PeerErrorSeverity.HighToleranceError);
}
```

```
BLOCK_TXS per-peer cap : 10 req / 1000 ms   (rate_limits.ts:55-65)
HighToleranceError     : -2 score points    (peer_scoring.ts:34-36)
disconnect threshold   : -50                (peer_scoring.ts:57)
ban threshold          : -100               (peer_scoring.ts:56)
```

With ~1 GCRA denial per second per peer, the prover loses 2 points/sec
at each responder. The first responder hits **-50 in ~25 s** and
goodbyes the prover via `peer_manager.ts:601-603 pruneUnhealthyPeers`
(`GoodByeReason.LOW_SCORE`); **-100 in ~50 s** would ban.

## Fix

A four-line change in
`yarn-project/p2p/src/services/libp2p/libp2p_service.ts`: fall back to
the archiver after the attestation-pool lookup, mirroring the responder
handler. If neither source has the block we still return `false`without
penalising the peer (it really is unverifiable locally).

```ts
const proposal = await this.mempools.attestationPool.getBlockProposalByArchive(...);
const blockTxHashes =
  proposal?.txHashes ??
  (await this.archiver.getBlock({ archive: request.archiveRoot }))?.body.txEffects.map(e => e.txHash);

if (blockTxHashes) { /* existing membership/order check, against blockTxHashes */ }
else               { /* unchanged: log warn, return false, no penalty */ }
```

The validator's other checks (archive-root match, bitvector length, no
dupes, size bounds, subset-membership, ordering) are unchanged.

## Tests

**Unit anchor** — `p2p/src/services/libp2p/libp2p_service.test.ts`

- New test: *"should accept when the proposal is missing but the block
is known via the archiver"* — verified red before the fix (`Expected:
true, Received: false`) and green after.
- Existing test renamed and tightened to *"should reject without
penalising when the block is unknown (no proposal and not in the
archiver)"* — covers the still-correct rejection path.
- All 46 tests in the file pass.

**End-to-end** —
`end-to-end/src/e2e_p2p/late_prover_tx_collection.test.ts`

Validators form a mesh and mine a block carrying real txs; a prover
joins **after** the block is mined, so it has the block in its archiver
but never received the proposal or txs over gossip. The test then drives
the exact production path the prover would take to gather txs for
proving:

```ts
const txCollection = (proverNode as ...).p2pClient.txCollection;
const collected = await txCollection.collectFastForBlock(minedBlock, blockTxHashes, { deadline });
expect(collected.map(t => t.getTxHash().toString()).sort())
  .toEqual(blockTxHashes.map(h => h.toString()).sort());
```

- Red on the unfixed source: `collected.length === 0` (validation
rejects every response, dumb loop runs until the deadline), assertion
fails.
- Green with the fix: all of `block.body.txEffects`'s txs are collected.
## Summary

Closes [A-970](https://linear.app/aztec-labs/issue/A-970).

Refresh the operator-facing slashing-configuration guide and the slasher
README to match the AZIP-7 end-state, now that the implementation work
for AZIP-7 has landed across the Slashing Post-Alpha Improvements
project.

Operator docs
(\`docs/docs-operate/operators/sequencer-management/slashing-configuration.md\`):
- Remove the obsolete \"Valid Epoch Not Proven\" section.
\`SLASH_PRUNE_PENALTY\` is gone with it.
- Rewrite \"Data Withholding\" for the end-of-slot detection rule and
add the matching \`SLASH_DATA_WITHHOLDING_TOLERANCE_SLOTS\` env var.
- Update \"Inactivity\" to mention end-of-epoch evaluation (no longer
waits for proven) and re-execution-based fault attribution.
- Flip the descendant offense section to proposer-fault framing to match
the rename in #23468.
- Add sections for the new offenses: broadcasted invalid block proposal,
broadcasted invalid checkpoint proposal, attesting to an invalid
checkpoint proposal, duplicate proposal, duplicate attestation.
- Sync the env-vars block and the offense-detection bullet list with the
current set of watchers.
- Convert touched section headings to sentence case per docs style.

Slasher README (\`yarn-project/slasher/README.md\`):
- Add a note under \`BROADCASTED_INVALID_CHECKPOINT_PROPOSAL\` to make
the AZIP-7 \"submitting block proposal after checkpoint\" mapping
explicit. That AZIP offense is detected via the existing
invalid-checkpoint watcher (a late block makes the prior checkpoint
retroactively invalid) rather than having its own offense type.

Stacked on #23468 (the \`ATTESTED_DESCENDANT_OF_INVALID\` →
\`PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS\` rename)
because the new env var name only exists once that PR lands.

## Test plan

- \`cd docs && yarn spellcheck\` (clean)
- Visual review of the rewritten \"Slashable offenses\" section against
the [AZIP-7
spec](https://github.com/AztecProtocol/governance/blob/main/AZIPs/azip-7-update_slashing.md)
…ERO (#23672)

**Stacked on:**
`[mr/fix-block-txs-validation-archiver-fallback](https://github.com/AztecProtocol/aztec-packages/pull/23624)`
(the archiver-fallback PR).

---

## Summary

A peer that legitimately can't find the requested block in its
attestation pool or archiver, but matches the requested hashes against
its tx pool and sends those txs back, signals "I don't have the block"
by setting `archiveRoot = Fr.ZERO` in the response
(`block_txs_handler.ts:54-58`). The requester's validation currently
treats that response the same as a malicious archive-root mismatch and
applies a `MidToleranceError` peer penalty. After enough such responses
from the same helpful peer, that peer is disconnected by the requester
for behaviour the protocol explicitly documents as legitimate.

This PR makes the validator recognise `Fr.ZERO` as the "I don't have the
block" signal and stop penalising peers for it.

## What's broken

### Responder side (correct, intentional)

`block_txs_handler.ts:54-58` — when the responder lacks the block (no
proposal, no archived block) but the request carried full tx hashes, it
tries to serve from its pool by hash and signals the "I don't know the
block" condition with `archiveRoot = Fr.zero()`:

```ts
if (!txHashes && requestedTxsHashes !== undefined) {
  const responseTxs = (await txPool.getTxsByHash(requestedTxsHashes)).filter(tx => !!tx);
  const response = new BlockTxsResponse(Fr.zero(), new TxArray(...responseTxs), BitVector.init(0, []));
  return response.toBuffer();
}
```

### Requester side (broken)

The validator at `libp2p_service.ts:1525-1530` penalizes any
archive-root mismatch with `MidToleranceError` (-10 score points) and
throws — *including* `Fr.zero`:

```ts
if (!response.archiveRoot.equals(request.archiveRoot)) {
  this.peerManager.penalizePeer(peerId, PeerErrorSeverity.MidToleranceError);
  throw new ValidationError(...);
}
```

After 5 such responses from the same helpful peer, that peer is at -50
in the requester's score book → disconnected by `pruneUnhealthyPeers`
(`peer_manager.ts:601-603`).

### Receiver-side code already documented the correct intent

The wrongful penalty contradicts what the receiver-side code explicitly
says should happen. `batch_tx_requester.ts:586-600` has
`handleArchiveRootMismatch`, whose docstring spells out exactly the
semantic we're restoring:

```ts
/**
 * Handles an archive root mismatch between local state and peer response.
 *
 * - Response archive is Fr.ZERO (peer pruned proposal, legitimate): marks peer dumb.
 * - Non-zero archive mismatch (malicious response): penalises + marks dumb.
 */
private handleArchiveRootMismatch(peerId: PeerId, response: BlockTxsResponse): void {
  if (!response.archiveRoot.isZero()) {
    this.peers.penalisePeer(peerId, PeerErrorSeverity.LowToleranceError);
  }
  this.peers.markPeerDumb(peerId);
  this.txsMetadata.clearPeerData(peerId);
}
```

But this function is only reached from `decideIfPeerIsSmart` →
`handleSuccessResponseFromPeer`, which only runs when validation returns
`true`. The validator's first check (archive-root equality) rejects
every archive-mismatched response — including `Fr.zero` — so
`handleArchiveRootMismatch` is never actually invoked. The "Fr.zero is
legitimate" exemption it encodes has been unreachable since the
validator's archive-root check was added.

The flow today:

```
   reqresp response
        │
        ▼
   validateRequestedBlockTxsConsistency
   ├─ Fr.zero       → reject + Mid penalty (BUG, contradicts the docstring above)
   ├─ non-zero ≠    → reject + Mid penalty
   └─ matches archive root ──► handleSuccessResponseFromPeer
                                  └─ decideIfPeerIsSmart
                                       └─ hasArchiveRootMismatch?
                                            always false (validator filtered them all out)
                                            ──► handleArchiveRootMismatch   ◄── DEAD CODE
```

So the receiver side already knew Fr.zero should not be penalised; the
decision was just being made at the wrong layer. This PR moves it to the
validator, where it can actually fire. `handleArchiveRootMismatch`
itself remains dead code (separate cleanup candidate, not in this PR).

## Fix

Special-case `response.archiveRoot.isZero()` at the top of
`validateRequestedBlockTxsConsistency` so the archive-root mismatch path
is bypassed for `Fr.zero` responses. We still return `false` (the txs
are dropped because we can't verify membership/order without the block)
but no peer penalty is applied — matching the intent of the `Fr.zero`
exemption in `batch_tx_requester.ts:593-600`.

```ts
// libp2p_service.ts (inside validateRequestedBlockTxsConsistency)
if (response.archiveRoot.isZero()) {
  this.logger.debug(`Peer ${peerId.toString()} signalled missing block with Fr.zero archive root`);
  return false;
}

if (!response.archiveRoot.equals(request.archiveRoot)) {
  this.peerManager.penalizePeer(peerId, PeerErrorSeverity.MidToleranceError);
  ...
}
```

The validator's other checks are unchanged. The early return prevents
the bitvector-length and `maxReturnable` checks downstream from firing
on a zero-length bitvector response, which would otherwise also wrongly
penalise the peer.

## Tests

**Unit** — `p2p/src/services/libp2p/libp2p_service.test.ts`

A new test in the `validateRequestedBlockTxsConsistency` describe block:
*"should not penalize a peer that signals lacking the block with Fr.ZERO
archive root"*. Constructs a response with `archiveRoot = Fr.ZERO` and
asserts that `peerManager.penalizePeer` is not called. Verified red
before the fix, green after.

**Integration** —
`p2p/src/client/test/p2p_client.integration_block_txs.test.ts`

A new test in the `p2p client integration block txs protocol` describe
block: *"requester does not penalize peer that returns Fr.zero (peer
lacks proposal but matched by hash)"*. Drives a real reqresp BLOCK_TXS
request over libp2p between two clients, lets the responder hit the
`Fr.zero` branch in `block_txs_handler`, then runs the response through
the requester's real `validateRequestedBlockTxsConsistency` and asserts
the requester's `peerManager.penalizePeer` is not called with
`MidToleranceError` or `LowToleranceError`. Verified red before the fix,
green after.
…ock (#23675)

## Problem

CI on `merge-train/spartan` failed in
`yarn-project/end-to-end/src/composed/e2e_cheat_codes.test.ts`
([log](http://ci.aztec-labs.com/1780045078587982) → test-engine
`358cb51c378c6913` → `4baba53d3b9feb67`):

```
e2e_cheat_codes › warpL2TimeAtLeastBy advances time by at least the duration
  expect(received).toBeGreaterThanOrEqual(expected)
  Expected: >= 1780048759   (timestampBefore_L2 + 100)
  Received:    1780048731    (advanced only ~72s)
```

## Root cause

`CheatCodes.warpL2TimeAtLeastBy(duration)` computed its target as
`eth.lastBlockTimestamp() (L1) + duration`, but its documented contract
is that the **L2** timestamp advances by at least `duration`, and the
test measures advancement against the latest **L2 block** timestamp.

In the composed test a live sequencer mines L2 blocks at slot boundaries
that can run ahead of anvil's L1 clock. When the latest L2 block
timestamp leads L1, adding `duration` to L1 produces a target below
`latestL2 + duration`, so the resulting block advances L2 time by less
than `duration` and the assertion fails. This is a latent correctness
bug in the helper that surfaces non-deterministically depending on
slot/L1 alignment.

## Fix

Anchor the target to whichever clock leads — `max(currentL1Timestamp,
latestL2BlockTimestamp) + duration` — before delegating to
`warpL2TimeAtLeastTo`. This guarantees the post-warp L2 block is at
least `duration` ahead of the current one, while remaining a strict
superset of the old behaviour (it never advances by less than before),
so other callers (`e2e_expiration_timestamp`, `e2e_contract_updates`,
`blacklist_token_contract`, `e2e_automine_smoke`, `lending_simulator`)
are unaffected.

Single-file change in `yarn-project/aztec/src/testing/cheat_codes.ts`.

---
*Created by
[claudebox](https://claudebox.work/v2/sessions/368b5e8b3cef0969) ·
group: `slackbot`*
## Problem

CI on `merge-train/spartan` is failing in the `avm-transpiler-native`
build step ([log](http://ci.aztec-labs.com/1780052027304932)):

```
error: the lock file /home/aztec-dev/aztec-packages/avm-transpiler/Cargo.lock needs to be updated but --locked was passed to prevent this
```

`cargo build --release --locked --bin avm-transpiler` rejects the stale
lock file.

## Root cause

The `noir/noir-repo` submodule was bumped on this branch and its
`acvm-repo` crates (`acir`, `acir_field`, `brillig`) gained a new path
dependency, `msgpack_tagged` (+ `msgpack_tagged_derive`).
`avm-transpiler/Cargo.lock` was not regenerated, so it no longer matches
`Cargo.toml` and the `--locked` CI build fails.

## Fix

Regenerated `avm-transpiler/Cargo.lock` with a plain (non-`--locked`)
build so only the required entries change — no bulk update. The diff
adds `msgpack_tagged`/`msgpack_tagged_derive` to the relevant dependency
lists plus the transitive deps they introduce (`serde_bytes`, `bs58`,
`tinyvec`, and a bump of `darling`/`serde_with`).

## Verification

- Reproduced the failure: `cargo build --release --locked --bin
avm-transpiler` → lock-file error.
- After the fix: `cargo build --release --locked --bin avm-transpiler`
succeeds.
- `./bootstrap.sh build_native` in `avm-transpiler/` (the exact failing
CI step) completes cleanly.

Only `avm-transpiler/Cargo.lock` is changed.

---
*Created by
[claudebox](https://claudebox.work/v2/sessions/d9f97447b0ab23ed) ·
group: `slackbot`*
- Replace stale `AZTEC_LAG_IN_EPOCHS` in `tps-scenario.env` with
`AZTEC_LAG_IN_EPOCHS_FOR_VALIDATOR_SET` and
`AZTEC_LAG_IN_EPOCHS_FOR_RANDAO` (value 1 each).
- Fixes immediate failure in nightly Spartan `wait-bench-l2-block` ([run
#184](https://github.com/AztecProtocol/aztec-packages/actions/runs/26627597396/job/78472605259)).

Co-authored-by: PhilWindle <60546371+PhilWindle@users.noreply.github.com>
… race (#23677)

## Problem

`AztecNodeService.getWorldState({ hash })` can intermittently throw

```
Block hash 0x.. not found in world state at block number N
(world state has no hash at that index ...). If the node API has been queried
with anchor block hash possibly a reorg has occurred.
```

even when no reorg happened. This surfaced as a flake in oxide /
`e2e_frozen_notes_refund` e2e tests proving a tx that reads existing
notes (private-kernel reset resolving a read request anchored on an
early block).

## Root cause

`getWorldState` did its world-state sync via `#syncWorldState()`, which
only synced to the archiver's *latest height* and ignored any requested
block hash. It then:

1. resolved the requested hash to a block number against the archiver,
and
2. took `getSnapshot(N)` and read the `ARCHIVE` leaf at index `N` to
double-check the hash.

The synchronizer reports `syncedTo >= N` as soon as the height advances,
but the archive-tree commit for block `N` may not yet be visible in the
snapshot view. A snapshot taken in that window has populated
note/nullifier trees but a half-written archive tree, so
`getLeafValue(ARCHIVE, N)` returns `undefined` and the double-check
throws. The sync-status height and the archive-tree write were not
barriered against each other.

## Fix

Make the hash-anchored path reorg-aware, as the existing synchronizer
plumbing already supports (`syncImmediate(targetBlockNumber,
blockHash)`):

- When the request carries a block hash, resolve it against the archiver
up front (fails fast with the existing clear reorg error if the hash is
unknown), then drive the sync to that exact `(blockNumber, hash)`.
- `WorldStateSynchronizer.syncImmediate` reads the *committed* `ARCHIVE`
leaf for that block; if it is missing or mismatched it forces a
`blockStream.sync()` and re-checks, only throwing on a genuine reorg.
This barriers on the archive-tree commit actually landing, closing the
race window, and turns real reorgs into a clear `block_hash_mismatch`
instead of a confusing snapshot error.

Block-number / tag queries are unchanged: they still sync to latest
height with no hash.

The existing snapshot double-check is kept as a backstop.

## Tests

Added unit tests in `server.test.ts` asserting that a hash-anchored
query forwards the resolved `(blockNumber, hash)` to `syncImmediate`,
and that block-number queries still sync to latest height with no hash.
Full `yarn-project` install/bootstrap (which builds the
barretenberg/noir portal packages) was not available in this session, so
the suite was not executed here — worth a CI run before merge.

---
*Created by
[claudebox](https://claudebox.work/v2/sessions/6bca66dc9c802402) ·
group: `slackbot`*
)

## Problem

CI on `merge-train/spartan` is failing in the `aztec-nr` step
([log](http://ci.aztec-labs.com/1780053790683323)) with
`BoundedVec::from_parts_unchecked` deprecation errors under `nargo check
--deny-warnings`.

The train's noir submodule had diverged from `next`:

| Branch | noir pin | date | `from_parts_unchecked` deprecated? |
|---|---|---|---|
| `next` | `f1a4575` | May 11 | no |
| `merge-train/spartan` | `4d039268` | May 28 | **yes** |

The newer pin (`4d039268`) was pulled onto the train by **PR #23675**
("fix(cheat-codes): warpL2TimeAtLeastBy…"), which bumped
`noir/noir-repo` from the May-11 pin to a May-28 nightly. That nightly
added `#[deprecated]` to `BoundedVec::from_parts_unchecked`, and since
aztec-nr builds with `--deny-warnings`, the two remaining call sites
became hard errors. The two noir commits are on divergent lines (neither
is an ancestor of the other), so the train was simply ahead of `next` on
noir.

## Fix

Pin the train's noir back to exactly what `next` uses. Only two files
differed from `next`:

- `noir/noir-repo` → `f1a4575` (next's pin)
- `avm-transpiler/Cargo.lock` → next's version (it had been re-synced to
the May-28 noir by #23683; restored to match the May-11 pin)

This restores parity with `next` and removes the deprecated API
entirely, so no aztec-nr source change is needed.

## Verification

Built `nargo` from the `f1a4575` pin and ran the failing check against
the **unmodified** aztec-nr source:

- `nargo check --deny-warnings` → exit 0 (the deprecation attribute is
absent in `f1a4575`).

## Note

This is an alternative to #23687, which fixed the same failure by
patching the two aztec-nr call sites to use `from_parts` against the
newer noir. Pick one: this PR keeps the train aligned with `next`'s
noir; #23687 keeps the newer noir and updates the source. Closing
whichever isn't chosen.

---
*Created by
[claudebox](https://claudebox.work/v2/sessions/2f980b2000011f91) ·
group: `slackbot`*
@PhilWindle PhilWindle enabled auto-merge May 29, 2026 12:24
Using the nightly tag as `source_ref` to ensure the code that the runner
uses is the same as what's deployed on GKE
@spypsy spypsy requested a review from charlielye as a code owner May 29, 2026 13:15
@PhilWindle PhilWindle added this pull request to the merge queue May 29, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026
…lake (#23653)

## Why

The `merge-train/spartan` train PR (#23580) was dequeued from the merge
queue. The merge-queue CI3 run ([run
26608568295](https://github.com/AztecProtocol/aztec-packages/actions/runs/26608568295))
failed in the `x2-full amd64 ci-full-no-test-cache` grind during the
**aztec-nr warnings check**, after only ~11s:

```
Checking aztec-nr for warnings...
Cloning into '.../noir-lang/poseidon/v0.3.0'...
Cloning into '.../noir-lang/sha256/v0.3.0'...
fatal: unable to access 'https://github.com/noir-lang/sha256/': Could not resolve host: github.com
Cannot read file .../noir-lang/sha256/v0.3.0/Nargo.toml - does it exist?
make: *** [Makefile:303: aztec-nr] Error 1
```

A transient DNS/network flake on the runner — not a code defect.
`aztec-nr/aztec/Nargo.toml` declares external git dependencies
(`noir-lang/sha256`, `noir-lang/poseidon`, pinned at `v0.3.0`) which
`nargo check` resolves by cloning from `github.com` on a cold cache.
When the runner momentarily can't resolve `github.com`, the clone fails
and dequeues the whole train.

## What

A blanket `retry` around `nargo check` would also re-run on genuine
check failures (type errors, denied warnings) — wasting CI time and
masking intent. So instead:

- **`ci3/retry` gains a `-p <regex>` option.** It captures the command's
combined output and only retries when a failure matches the regex; any
non-matching failure exits immediately with the original code. Without
`-p`, behavior is unchanged (the heavily-used default path is
untouched). `pipefail` ensures the wrapped command's exit code (not
`tee`'s) is what's checked, and `tee` finishes before inspection so the
captured output is complete.
- **`aztec-nr/bootstrap.sh`** wraps its two network-touching nargo calls
(`check`, `doc --check`) with `retry -p "<git transport errors>"`,
matching only `Could not resolve host`, `unable to access`, `Connection
timed out/refused`, `Failed to connect`, `TLS connect error`, `early
EOF`, `RPC failed`. These never overlap with nargo's `error:`/`warning:`
output, so a genuine check failure still fails on the first attempt.

`nargo` has no standalone dependency-install/fetch command (its
subcommands are `check, compile, dap, debug, doc, execute, expand,
export, fmt, fuzz, info, init, interpret, lsp, new, test`); resolution
only happens inside `check`/`compile`/`test`, so the regex-gated retry
is the workable option of the two suggested.

## Verification

- `bash -n` on all three files; `ci3/tests/retry_test` (new,
auto-discovered by the ci3 test runner) passes all 6 cases:
- default mode retries a transient failure then succeeds / gives up
after 3 attempts
  - pattern mode retries a matching network failure then succeeds
- **pattern mode fails fast (1 attempt) on a non-matching genuine
error** ← the behavior requested
- pattern mode gives up after 3 attempts on a persistent matching
failure
  - `RETRY_DISABLED` runs the command exactly once

The full `./bootstrap.sh ci` run is the same orchestrated remote-EC2 CI
that failed here and isn't reproducible on a dev host; the transient DNS
failure also can't be reproduced where DNS works. Verification is
therefore at the retry/wrapper level, which is exactly what this change
touches.
@spalladino spalladino requested a review from nventuro as a code owner May 29, 2026 14:34
@PhilWindle PhilWindle enabled auto-merge May 29, 2026 14:50
spypsy and others added 4 commits May 29, 2026 12:00
## Summary
- Use the last L1 slot timestamp inside the target L2 slot for
proposal/header simulations.
- Keep bundle simulation and pre-broadcast header validation on the same
timestamp rule to avoid `eth_simulateV1` timestamp-order failures.
- Add regression coverage for both simulation paths.
…23625)

## Motivation

The node exposed four log-retrieval methods with three filter shapes and
two return shapes, while the private index was only tag-keyed — so a
`(tag, narrow block range)` query loaded the entire per-tag history into
memory. Pagination was a single global `page` counter shared across all
tags, and the public path was split between a `LogFilter` method and a
tag-based method. v5 is already a breaking release, so this collapses
everything to a single, fast, tag-based surface with no back-compat.

Fixes A-1111
Fixes A-1031

## Approach

Two methods, `getPrivateLogsByTags(query)` and
`getPublicLogsByTags(query)`, replace the four old ones;
`getContractClassLogs` and `getPublicLogs(LogFilter)` are gone. The
archiver stores each log under a fixed-width composite hex-string key
`[contractHex] - tagHex - blockHex8 - txIndexHex8 - logIndexHex8` in an
LMDB map, so every supported filter (tag, block range, txHash, per-tag
`afterLog` cursor, `referenceBlock` reorg cap) reduces to a single
ordered range scan. Note hashes and nullifiers are never copied into the
log index — they're fetched on demand from the block store via a partial
deserializer that reads only the relevant prefix of the stored
`IndexedTxEffect`. `ARCHIVER_DB_VERSION` bumps 6 → 7, so the archiver
self-wipes and re-syncs from L1 on first start.

## API

Two node methods replace the previous four. Each returns one inner array
per element of `query.tags`, in input order; an empty inner array means
that tag matched nothing.

```ts
getPrivateLogsByTags(query: PrivateLogsQuery): Promise<LogResult[][]>;
getPublicLogsByTags(query: PublicLogsQuery): Promise<LogResult[][]>;
```

**Input**

```ts
// Filters shared by both queries.
type LogsQueryBase = {
  fromBlock?: BlockNumber;        // inclusive lower bound
  toBlock?: BlockNumber;          // exclusive upper bound
  txHash?: TxHash;                // restrict to one tx; mutually exclusive with fromBlock/toBlock
  referenceBlock?: BlockHash;     // reorg anchor: throws if that block is no longer present
  includeEffects?: boolean;       // also attach each log's tx noteHashes + nullifiers
  limitPerTag?: number;           // page size, 1..MAX_LOGS_PER_TAG (default & max = 20)
};

// A tag to query, optionally resuming strictly after a previously-seen log.
// The bare `T` form starts from the beginning.
type TagQuery<T> = T | { tag: T; afterLog?: LogCursor };

type PrivateLogsQuery = LogsQueryBase & {
  tags: TagQuery<SiloedTag>[];    // 1..MAX_RPC_LEN (100) entries
};

type PublicLogsQuery = LogsQueryBase & {
  contractAddress: AztecAddress;  // required for public queries
  tags: TagQuery<Tag>[];          // 1..MAX_RPC_LEN (100) entries
};
```

**Output**

```ts
type LogResult<Opts = { includeEffects?: boolean }> = {
  logData: Fr[];                  // log fields; the tag is logData[0]
  blockNumber: BlockNumber;
  blockHash: BlockHash;
  blockTimestamp: UInt64;
  txHash: TxHash;
  txIndexWithinBlock: number;     // 0-based index of the tx within its block
  logIndexWithinTx: number;       // 0-based index of the log within its tx
} & (Opts extends { includeEffects: true }
  ? { noteHashes: Fr[]; nullifiers: Fr[] } // present only when includeEffects: true
  : {});

// Opaque per-tag pagination cursor.
// String form: `<blockNumber>-<txIndexWithinBlock>-<logIndexWithinTx>`.
class LogCursor {
  blockNumber: BlockNumber;
  txIndexWithinBlock: number;
  logIndexWithinTx: number;
  static fromLog(log: LogResult): LogCursor;
}
```

Pagination is per-tag: feed a tag's last `LogResult` back as the next
query's `afterLog` (`{ tag, afterLog: LogCursor.fromLog(last) }`). A tag
is exhausted once it returns fewer than `limitPerTag` results. The
stdlib helpers `queryAllPrivateLogsByTags` / `queryAllPublicLogsByTags`
drive this loop and return the fully-drained results.

## Changes

- **stdlib**: new `LogResult`, `LogCursor`, `PrivateLogsQuery` /
`PublicLogsQuery` types with zod schemas; `txHash` ⊕
`fromBlock`/`toBlock` enforced via `.refine` (but `txHash` + `afterLog`
is allowed, to paginate within a tx's logs). `L2LogsSource` /
`AztecNode` / `Archiver` interfaces and schemas reduced to the two new
methods. Deleted `LogFilter`, `LogId`, `TxScopedL2Log`,
`ExtendedPublicLog`, `ExtendedContractClassLog`,
`GetPublicLogsResponse`, `GetContractClassLogsResponse`, and the dead
`Tx.getPublicLogs(logsSource)`.
- **archiver**: full `LogStore` rewrite — two hex-string-keyed
`AztecAsyncMap` primary maps (keys are fixed-width zero-padded lowercase
hex, so `ordered-binary`'s string ordering matches the canonical
`(contract, tag, block, txIndex, logIndex)` tuple and every filter is a
single ordered range scan) plus two `blockNumber → string[]` secondary
indices driving `deleteLogs` (replaces the buggy per-block tag-union
list). All reads, including the `referenceBlock` existence check, run
inside one `db.transactionAsync` across `BlockStore` + `LogStore`; a
`referenceBlock` equal to the (synthetic, unindexed) genesis block hash
resolves to the genesis block number rather than throwing. New
`BlockStore.getNoteHashesAndNullifiers(txHashes)` is a batched partial
deserializer for `includeEffects`, and `getTxLocation` reads only the
40-byte header instead of the full `TxEffect`. Contract-class-log
storage removed entirely. `ARCHIVER_DB_VERSION` 6 → 7.
`OutOfOrderLogInsertionError` and the `ARCHIVER_MAX_LOGS` env var
dropped.
- **aztec-node**: four RPC handlers collapsed to two thin forwarders;
`referenceBlock` resolution moved into the store so it shares the
transaction.
- **pxe**: `getAllPages` rewritten from a global `page` counter to
per-tag `afterLog` cursors — each round re-queries only tags that
returned a full page, and tags drop out as soon as they return a short
page. `fromBlock` / `toBlock` are pushed down into the node, eliminating
the in-memory `#extractLogs` range filter.
- **aztec.js**: `getPublicEvents` migrated to the new query shape;
`PublicEventFilter.contractAddress` is now required;
`EventFilterBase.afterLog: LogId → LogCursor`.
- **cli**: `get-logs` requires `--contract-address` and `--tag`;
`--after-log` parses a `LogCursor` string
`<blockNumber>-<txIndexWithinBlock>-<logIndexWithinTx>`.
- **end-to-end**: `e2e_ordering` rewritten to read
`getBlock().body.txEffects[*].publicLogs` directly (the new API drops
the tag-less, contract-less query shape).
- **docs**: migration notes for client consumers; operator changelog
covering the DB version bump and one-time resync.
@spalladino spalladino requested a review from a team as a code owner May 29, 2026 16:27
…p loopback (#23659)

## Motivation

The proposer relied on looping its own checkpoint proposal back through
the p2p receive path to advance its local proposed-checkpoint tip before
propagating. Under `broadcastInvalidBlockProposal` the broadcast
checkpoint archive is deliberately corrupted, so the loopback handler's
archive-based block lookup (`getBlockData({ archive })`) found nothing
and retried until the next slot. By the time the proposer returned from
broadcast, propagation had slipped past the p2p validator's stale window
— producing intermittent failures (e.g. peers rejecting a late slot
proposal).

## Approach

The proposer's optimistic proposed-checkpoint tip is the proposer's own
local state, so it is now set directly in the sequencer's checkpoint
proposal job rather than via a p2p loopback. The job adds the proposed
checkpoint to the archiver from local checkpoint data (block numbers and
counts, never the possibly-corrupted broadcast archive) immediately
before gossiping, failing closed if the local insert fails. Because
every block is already added to the archiver's FIFO queue (and awaited)
during block building, the checkpoint insert needs no retry. The
`notifyOwnCheckpointProposal` loopback is removed entirely, so the path
is identical whether p2p is enabled or not.

## Changes

- **stdlib**: New `ProposedCheckpointSink` interface alongside
`L2BlockSink`.
- **sequencer-client**: `CheckpointProposalJob` now pushes the proposed
checkpoint to the archiver from local data before broadcast, gated on
proposer pipelining and skipped when block-push is disabled
(`skipPushProposedBlocksToArchiver`, fisherman mode); widened the
sequencer/client `l2BlockSource` types to `ProposedCheckpointSink`.
- **p2p**: Removed `notifyOwnCheckpointProposal` from the `P2PService`
interface, the libp2p and dummy services, and the
`P2PClient.broadcastCheckpointProposal` call site (own proposals are
still stored in the attestation pool before propagation).
- **validator-client**: The all-nodes own-proposal branch now skips
validation and returns; removed the now-dead
`setProposedCheckpointFromBlocks` and narrowed the archiver `Pick`.
- **tests**: Added job tests (push-from-local-data and
order-before-gossip, abort-on-push-failure,
no-push-when-pipelining-disabled, fisherman) and a proposal_handler
own-proposal test; removed the obsolete libp2p loopback test and the e2e
slash-test stub; widened affected mock types.
@PhilWindle PhilWindle added this pull request to the merge queue May 29, 2026
@AztecBot

AztecBot commented May 29, 2026

Copy link
Copy Markdown
Collaborator Author

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/dbf33a55aa777b80�dbf33a55aa777b808;;�): yarn-project/end-to-end/scripts/run_test.sh ha src/composed/ha/e2e_ha_full.test.ts (296s) (code: 0)

Merged via the queue into next with commit 992f8ea May 29, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants