Skip to content

fix(e2e): propagate l1ContractsArgs into node config so archiver matches L1#23514

Merged
spalladino merged 1 commit into
merge-train/spartanfrom
cb/cd11b97e38b2
May 23, 2026
Merged

fix(e2e): propagate l1ContractsArgs into node config so archiver matches L1#23514
spalladino merged 1 commit into
merge-train/spartanfrom
cb/cd11b97e38b2

Conversation

@AztecBot

@AztecBot AztecBot commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Why

e2e_cross_chain_messaging/l1_to_l2.parallel.test.ts "consumed from public repeatedly" failed in the merge-train (http://ci.aztec-labs.com/1779463097551535) because the chain pruned all 8 pending checkpoints to block 0 at the aztecProofSubmissionEpochs=2 deadline (L1 genesis + 144 s), dropping the in-flight L1→L2 tx and the subsequent advanceBlock noop. The EpochTestSettler safety net never fired — 124 s of polling, zero handler invocations.

Built yarn-project locally (after working around a sandbox proxy/DNS quirk that was blocking the noir build), ran the failing test under LOG_LEVEL='info; trace:prover-node:epoch-monitor', instrumented archiver.isEpochComplete, and found the actual root cause.

What's actually broken

In setup(), the per-test opts.l1ContractsArgs is spread into the deployAztecL1Contracts call (so it lands on the deployed rollup) but never written back to the node config. The archiver factory then constructs its l1Constants from config.aztecEpochDuration (default 32) even though the rollup was deployed with the override (4).

That puts the EpochMonitor and the archiver on different epochDurations:

  • EpochTestSettler's EpochMonitor reads epochDuration from RollupCheatCodes.getConfig() (rollup contract): 4. It computes epochToProve=0 for block 1 at slot 2 — correct.
  • archiver.isEpochComplete(0) uses its own l1Constants epochDuration=32. getSlotRangeForEpoch(0)endSlot=31. The checkpointed L2 tip's slot is 2 (or even 0 when the bad endSlot makes the resolver fall through to the genesis branch), so slot < endSlot. It then falls through to the L1-timestamp check, which won't be satisfied for another 363 s.

isEpochComplete(0) permanently returns false for this run, handleEpochReadyToProve never fires, markAsProven is never called, and the proof window expires before any test code reaches advanceBlock (which is the only other path that calls markAsProven for this test).

Smoking-gun probe output (with temporary instrumentation):

isEpochComplete[probe]: timestamp-fail {
  epochNumber: 0,
  checkpointedSlot: 0,
  endSlot: 31,            # << rollup contract has endSlot=3
  l1Timestamp: 1779484115,
  endTimestamp: 1779484479,
  diff: 363,
}

What's in the PR

A single small change in yarn-project/end-to-end/src/fixtures/setup.ts: after the L1 deploy and Object.assign(config, l1ContractAddresses), copy every defined field from opts.l1ContractsArgs onto config. Now any field a test overrides via l1ContractsArgs (aztecEpochDuration: 4, aztecProofSubmissionEpochs: 2, etc.) is reflected in the node config that the archiver later reads.

The undefined-filtering matters: P2PNetworkTest builds deployL1ContractsArgs by spreading a partial AztecNodeConfig (...initialValidatorConfig in p2p_network.ts:133), which leaves dataDirectory (and other unset node-config fields) at undefined inside the value. A blind Object.assign(config, opts.l1ContractsArgs) would then clobber the temp dataDirectory that setup() had already assigned earlier in this same function and crash setupSharedBlobStorage. The first revision of this PR had that bug; CI caught it in e2e_p2p/add_rollup.test.ts with TypeError: The "path" argument must be of type string. Received undefined at setup.ts:110setupSharedBlobStorage(setup.ts:505). Iterating field-by-field and skipping undefined fixes that.

Verification

Both tests pass locally with the refined fix.

l1_to_l2.parallel.test.ts (original failure) — settler fires on schedule, well before the first advanceBlock call:

[21:19:32.340] VERBOSE prover-node:epoch-monitor Epoch 0 is ready to be proven
[21:19:32.347] INFO    epoch-settler            Settling epoch 0 with blocks 1 to 2
[21:19:32.355] INFO    aztecjs:cheat_codes      Proven tip moved: 0 -> 2. Pending tip: 2.
...
PASS src/e2e_cross_chain_messaging/l1_to_l2.parallel.test.ts (438.9 s)

e2e_p2p/add_rollup.test.ts (the regression CI surfaced on the prior PR revision):

PASS src/e2e_p2p/add_rollup.test.ts (578.7 s)

Independent before/after archiver probe also confirms the divergence is gone:

Run [archiver-factory probe] Settler activity (Advanced outbox to epoch N)
pre-fix (original setup.ts) config.aztecEpochDuration=32 contract.epochDuration=4 none; chain prune relied entirely on advanceBlock → watcher.markAsProven()
post-fix (this PR) config.aztecEpochDuration=4 contract.epochDuration=4 epoch 0, epoch 1, … fire before any test-side markAsProven runs

Reconsidering the original PR

The first revision of this PR rewrote EpochTestSettler to poll rollup.getTips() directly and bypass EpochMonitor + the L2BlockSource entirely. That works — and is a defensible robustness change in its own right — but it doesn't address the underlying bug, which is a config-drift between the L1 deployment args and the node config in setup(). With the root-cause fix in place the existing settler functions correctly, so the contract-direct rewrite isn't needed.

Worth doing as a separate follow-up if we want to harden the archiver itself:

  • yarn-project/archiver/src/factory.ts reads epochDuration / slotDuration from config even though proofSubmissionEpochs, l1GenesisTime, etc. are already fetched from the rollup contract. Switching those two over removes the config-drift class of bug entirely. The PR keeps that scope out since the test-fixture fix is enough to close the bug, and the production node has no path that lets the two get out of sync (env-var driven deploy + env-var driven node config).

Other affected tests

A grep for l1ContractsArgs shows only a handful of tests pass per-test L1 overrides (everything else relies on getL1ContractsConfigEnvVars() defaults). Of those, e2e_cross_chain_messaging/l1_to_l2.parallel.test.ts is the canary because it combines a tight aztecProofSubmissionEpochs=2/aztecEpochDuration=4 window with a 140+ s setup. Other tests either pad the proof window high (aztecProofSubmissionEpochs: 1024/640) or rely on AUTOMINE_E2E_OPTS + manual cheatCodes.rollup.markAsProven(). With this fix every test that overrides L1 config via l1ContractsArgs now has a consistent node config.

Diagnostic gist

Patch used to identify the gate: https://gist.github.com/AztecBot/a51adb514b9e374fb993212cd8a4beca

@AztecBot AztecBot added the claudebox Owned by claudebox. it can push to this PR. label May 22, 2026
@AztecBot AztecBot changed the title test(e2e): rewrite EpochTestSettler to poll the rollup contract directly fix(e2e): propagate l1ContractsArgs into node config so archiver matches L1 May 22, 2026
@spalladino spalladino marked this pull request as ready for review May 23, 2026 11:31
@spalladino spalladino enabled auto-merge (squash) May 23, 2026 11:31
…hes L1

In `setup()`, `opts.l1ContractsArgs` lands on the deployed rollup contract but
never makes it back into the node config that the archiver factory reads. The
archiver's `l1Constants` therefore uses the default `aztecEpochDuration` (32)
even when a test explicitly deploys the rollup with a smaller value.

This breaks any test that overrides `aztecEpochDuration` via `l1ContractsArgs`
and relies on `EpochTestSettler` to keep the chain alive under the resulting
tight `aztecProofSubmissionEpochs` window. `EpochMonitor.work()` reads the
contract's `epochDuration=4` and decides `epochToProve=0` for block 1 at slot
2; the archiver's `isEpochComplete(0)`, however, reads its own
`epochDuration=32`, computes `endSlot=31`, and returns false until L1 wall-time
crosses slot 31. The settler stays silent, no checkpoint is ever marked as
proven, and the chain prunes when the 144s deadline passes mid-setup.

Observed in `e2e_cross_chain_messaging/l1_to_l2.parallel.test.ts` "non-
registered portal, public, repeatedly" on merge-train/spartan (CI
1779463097551535) — Aztec CI failure, 8 checkpoints rolled back to block 0,
in-flight L1→L2 tx dropped, `advanceBlock` threw "Failed to advance block 8".

Confirmed by instrumenting `archiver.isEpochComplete`:
```
isEpochComplete[probe]: timestamp-fail {
  epochNumber: 0,
  checkpointedSlot: 0,    # archiver's l2 tip was at slot 2 — the 'checkpointed'
                          # block-data lookup with the wrong epochDuration ended
                          # up reading the synthetic genesis block (slot 0).
  endSlot: 31,            # << wrong: rollup contract says endSlot=3
  l1Timestamp: 1779484115,
  endTimestamp: 1779484479,
  diff: 363,
}
```

After this fix, `EpochMonitor`'s "Epoch X is ready to be proven" fires every
epoch and `epoch-settler` writes `Settling epoch N with blocks ...` /
`Proven tip moved: A -> B` well before the proof window expires. The failing
test passes locally with the diagnostic still in place (`PASS` in 441s, four
`Settling epoch` lines for epochs 0..3 before the first `advanceBlock` is
called).

Diagnostic patch used to identify the gate:
https://gist.github.com/AztecBot/a51adb514b9e374fb993212cd8a4beca
@spalladino spalladino merged commit 574588b into merge-train/spartan May 23, 2026
14 checks passed
@spalladino spalladino deleted the cb/cd11b97e38b2 branch May 23, 2026 12:40
danielntmd pushed a commit to danielntmd/aztec-packages that referenced this pull request Jun 4, 2026
BEGIN_COMMIT_OVERRIDE
refactor(p2p): merge FastTxCollection into TxCollection with sequential
pipeline (AztecProtocol#23245)
refactor(publisher): bundle-level simulate; drop per-action enqueue sims
(AztecProtocol#23165)
refactor(stdlib): remove deprecated RevertCode/TxExecutionResult aliases
(AztecProtocol#23249)
test(e2e): fix race in 'proposer invalidates multiple checkpoints'
(AztecProtocol#23259)
fix: clean up old jobs regardless of pending status (AztecProtocol#23260)
refactor(p2p): remove unused sendBatchRequest (AztecProtocol#23273)
chore(p2p): remove proposal_tx_collector leftovers (AztecProtocol#23276)
feat: slash truncated checkpoint proposals (AztecProtocol#23250)
refactor: remove unused map in attestation pool (AztecProtocol#23284)
chore(p2p): assert last block in checkpoint proposal is correct (AztecProtocol#23274)
refactor(l1-tx-utils): use DateProvider for fail-fast timeout check
(AztecProtocol#23257)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23277)
test(e2e): fix race in broadcasted_invalid_block_proposal_slash under
pipelining (AztecProtocol#23302)
fix(archiver): atomic getter for L2 tips (AztecProtocol#23295)
fix(sequencer): use targetSlot in tryVoteWhenEscapeHatchOpen under
pipelining (AztecProtocol#23296)
fix(world-state): make fork close idempotent for pruned forks (AztecProtocol#23298)
test(e2e): migrate passing tests to proposer pipelining (AztecProtocol#23275)
chore: update dashboard (AztecProtocol#23312)
chore: Revert "feat(sandbox): support proposer pipelining in local
network" (AztecProtocol#23313)
test: slash on bad attestation (AztecProtocol#23184)
feat(slasher): per-slot data-withholding watcher (A-523, A-525) (AztecProtocol#23116)
test(e2e): enable pipelining on e2e debug trace (AztecProtocol#23301)
test(e2e): enable pipelining on l1-to-l2 test (AztecProtocol#23300)
test(e2e): switch fee_settings to organic fee bumps under pipelining
(AztecProtocol#23303)
fix(ci): retry sqlite3mc-wasm download on transient DNS/TLS failures
(AztecProtocol#23333)
test(e2e): wait for real oracle rotation in fee_settings inflate helper
(AztecProtocol#23334)
test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining
(AztecProtocol#23336)
fix(spartan-bench): tolerate older node images in SlasherConfig schema
(AztecProtocol#23351)
fix: interrupt prover jobs in stop (AztecProtocol#23358)
test(e2e): enable pipelining on bot, fees, and avm simulator tests
(AztecProtocol#23329)
feat(sentinel): end-of-epoch evaluation with re-execution outcomes
(AztecProtocol#23286)
feat: slash for invalid checkpoint proposals (AztecProtocol#23270)
fix: fork closure in epoch proving jobs (AztecProtocol#23390)
fix(slasher): anchor watcher scans at archiver synced L2 slot (AztecProtocol#23394)
fix: avoid npm uplink for aztec-up local publishes (AztecProtocol#23396)
test(e2e): ignore benign 'Insufficient valid txs' block-build-failed in
epochs tests (AztecProtocol#23424)
chore: refactor weekly proving test wait (AztecProtocol#23395)
refactor: add fifo set (AztecProtocol#23271)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23327)
fix(p2p): validate BLOCK_TXS in BatchTxRequester (AztecProtocol#23371)
chore(p2p): simplify IBatchRequestTxValidator (AztecProtocol#23373)
feat(sequencer): AutomineSequencer for single-sequencer e2e tests
(AztecProtocol#23354)
fix(prover): wait for previous epoch to be proven (AztecProtocol#23458)
chore: collocate provers (AztecProtocol#23439)
chore: rm staging-ignition (AztecProtocol#23440)
chore: rm unused networks (AztecProtocol#23441)
test(e2e): migrate block_building, multi_validator_node,
publisher_funding, invalid_checkpoint_proposal to pipelining (AztecProtocol#23414)
fix(archiver): reconcile local blocks with L1 checkpoints by block
number (AztecProtocol#23461)
feat: Updated slash conditions on block proposals (AztecProtocol#23466)
test(e2e): migrate HA full test to pipelining (AztecProtocol#23463)
chore: update resource profiles (AztecProtocol#23442)
chore: update debug log levels (AztecProtocol#23456)
test: fix flaky sentinel_status_slash by asserting the fault on the
checkpoint slot (AztecProtocol#23483)
feat(slasher): slash checkpoint equivocation between P2P and L1 (A-980)
(AztecProtocol#23436)
refactor(slasher): rename ATTESTED_DESCENDANT_OF_INVALID ->
PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS (AztecProtocol#23468)
fix: reject block proposals in poisoned slots (AztecProtocol#23411)
fix: retry nargo dep + solc downloads to survive transient DNS drops
(AztecProtocol#23490)
fix: enrich json-rpc tracing (AztecProtocol#23412)
feat: add trace export controls (AztecProtocol#23413)
test(e2e): assert no equivocation offenses in HA full test (AztecProtocol#23496)
test: cover invalid checkpoint proposal slashing (AztecProtocol#23503)
test(e2e): migrate more e2e suites to proposer pipelining (AztecProtocol#23482)
test: flag e2e_slashing_attested_invalid_proposal as flake under
pipelining (AztecProtocol#23501)
test: flag e2e_p2p_duplicate_proposal_slash as flake under pipelining
(AztecProtocol#23515)
test(e2e): require cross-observer agreement on sentinel fault slot
(AztecProtocol#23513)
test: flag e2e_ha_full afterAll hook timeout as flake under pipelining
(AztecProtocol#23524)
fix(e2e): propagate l1ContractsArgs into node config so archiver matches
L1 (AztecProtocol#23514)
test: flag e2e_multi_validator_node_key_store P2P tx-dropped failure as
flake (AztecProtocol#23528)
test(cheat-codes): retry warpL2TimeAtLeastTo in-current-slot test on L1
race (AztecProtocol#23533)
test(e2e_ha_full): parallel HA peer node teardown with per-node deadline
(AztecProtocol#23539)
test: flag e2e_ha_full as flake under HA pipelining (AztecProtocol#23541)
test(ci): skip e2e_ha_full entirely on merge-train/spartan (AztecProtocol#23542)
test(ci): skip e2e_multi_validator_node_key_store entirely on
merge-train/spartan (AztecProtocol#23544)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants