Skip to content

test(e2e): switch fee_settings to organic fee bumps under pipelining#23303

Merged
PhilWindle merged 4 commits into
merge-train/spartanfrom
spl/fix-a-1057-pipelined-fee-config-race
May 16, 2026
Merged

test(e2e): switch fee_settings to organic fee bumps under pipelining#23303
PhilWindle merged 4 commits into
merge-train/spartanfrom
spl/fix-a-1057-pipelined-fee-config-race

Conversation

@spalladino

Copy link
Copy Markdown
Contributor

Motivation

Under proposer pipelining, a sequencer builds slot N's checkpoint header (and bakes manaMinFee into gasFees.feePerL2Gas) during slot N-1. If governance executes setProvingCostPerMana or updateManaTarget between that build and the L1 submission, L1 recomputes manaMinFee from the post-mutation FeeStore.config and the submitted header reverts with Rollup__InvalidManaMinFee. The e2e_fees/fee_settings suite used bumpProvingCostPerMana — exactly that governance path — as its fee-spike mechanism, which made it hostile to pipelining and didn't reflect any organic mainnet fee channel. The publisher's bundle-simulator drop log also stopped decoding revert payloads in PR #23165, leaving operators staring at raw 0x... data.

Approach

Drive the fee spike via an L1 base-fee bump (setNextBlockBaseFeePerGas + updateL1GasFeeOracle) — the dominant feePerL2Gas channel and a closer analogue to organic mainnet behaviour. Enable pipelining for the suite via the FeesTest constructor (enableProposerPipelining, inboxLag, manaTarget, walletMinFeePadding, etc.). Add an explicit recovery test that bumps governance and asserts the chain advances past the invalidated checkpoint and that a fresh tx still mines. Restore decoded revert names in logDroppedInSim by merging the relevant ABIs and routing through a shared tryDecodeRevertReason helper.

Changes

  • end-to-end (tests): Rewrite fee_settings.test.ts to run under pipelining, replace the governance fee spike with an organic L1-base-fee bump, and add a recovery test for the governance-mutation race.
  • ethereum: Add tryDecodeRevertReason(data, abi) in utils.ts and route Multicall3 through it (deduplicating the existing in-place decoder).
  • sequencer-client: In logDroppedInSim, decode unknown revert payloads against a merged [RollupAbi, SlashingProposerAbi, EmpireBaseAbi, ErrorsAbi] ABI and emit both the readable form and the raw payload.

Fixes A-1057

Replace the governance-driven `bumpProvingCostPerMana` fee spike — which is exactly
the only-owner path that triggers A-1057 under pipelining — with an L1-base-fee bump
via `setNextBlockBaseFeePerGas` + `updateL1GasFeeOracle`. Enable pipelining for the
suite and add a recovery test asserting the chain advances past a governance bump.

Also surface decoded revert names alongside raw return data in `logDroppedInSim`
by merging RollupAbi/SlashingProposerAbi/EmpireBaseAbi/ErrorsAbi and decoding any
unknown payload through a shared helper.
… path

Per review: the prior assertions only proved fees rose by any amount and that
checkpoints kept advancing — both could pass on regressions that didn't actually
exercise the wallet-padding or A-1057 recovery paths.

- Wallet-padding tests now require the L2 fee bump to exceed 10% (not just
  any-positive rise), and the L1 base-fee target gets a 0.1 gwei floor so anvil's
  natural EIP-1559 decay between rotations can't drive the new oracle snapshot
  below the previous one.
- Recovery test now captures the slot of the pre-bump checkpoint, the slot of the
  recovery target, and asserts the slot span exceeds the checkpoint span — i.e.
  at least one L2 slot was skipped between the bump and recovery. That's the
  positive signal that a pipelined header was actually invalidated, distinguishing
  the A-1057 path from a chain that absorbed the governance write silently.
`aztecNode.getCheckpointNumber('checkpointed')` returns a `CheckpointNumber`
directly; `CheckpointNumber.add` preserves the brand on arithmetic. Resolves
the `aztec-custom/no-unsafe-branded-type-conversion` lint errors.
The 100ms deadline raced wall-clock against per-test setup overhead on slow CI
hosts; locally it failed deterministically. Use jest fake timers and an explicit
jest.setSystemTime jump during the first forward mock so the deadline elapses
between iterations without depending on real timer accuracy.
@AztecBot

Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/d028f458bd62107e�d028f458bd62107e8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_mbps.parallel.test.ts "builds multiple blocks per slot with L2 to L1 messages" (376s) (code: 0) group:e2e-p2p-epoch-flakes

@PhilWindle PhilWindle merged commit 397a5de into merge-train/spartan May 16, 2026
14 checks passed
@PhilWindle PhilWindle deleted the spl/fix-a-1057-pipelined-fee-config-race branch May 16, 2026 09:54
spalladino pushed a commit that referenced this pull request May 16, 2026
…er (#23334)

## Why

PR #23253 was dequeued from the merge queue when `merge-queue-heavy`'s
grind exercise hit a flake in `e2e_fees/fee_settings.test.ts`
(introduced by #23303, the head of `merge-train/spartan`). Failing
sub-test: `reproduces the stale fee snapshot race deterministically`. CI
log: http://ci.aztec-labs.com/cd390ea14cac1093

```
expect(received).toBeGreaterThan(expected)

Expected: > 1134386110000n
Received:   1067501300000n
  214 |       expect(bumpedMinFees.feePerL2Gas).toBeGreaterThan((lowerMinFees.feePerL2Gas * 11n) / 10n);
```

`bumpedMinFees` (`1067501300000`) was effectively the natural L2
baseline at that moment — no oracle rotation had occurred. The retry
inside `inflateL2FeesViaL1BaseFee` exited as soon as `after > before`
(with `before` captured at function entry), but the natural L2 fee
fluctuates between L1 blocks (EIP-1559 decay swings the L1 base-fee
sample), so a sub-percent upward drift satisfied the exit without the
oracle deadband (`LIFETIME - LAG = 3` L2 slots = 36 s) ever opening. The
test ran for only ~15 s before exiting, well short of the deadband.

The caller's `bumpedMinFees > lowerMinFees * 1.1` assertion then failed
because `lowerMinFees` was a separate snapshot taken earlier, and
natural drift between the two snapshots was below 10 %.

There is also a latent upper-bound issue: even on a successful rotation
the original `3x` L1 base-fee bump drives the L2 fee to ~2.0–2.5x once
EIP-1559 decay on the rotation-tx's block is applied, which would have
also failed `higherMinFees > bumpedMinFees` (where `higherMinFees =
lowerMinFees * 2n`).

## What

Three changes in
`yarn-project/end-to-end/src/e2e_fees/fee_settings.test.ts`:

- `inflateL2FeesViaL1BaseFee` takes a `reference: GasFees` parameter and
only returns when `after.feePerL2Gas >= reference * 13/10`. This
distinguishes a real oracle rotation (≥1.5x rise) from ambient noise
(≤±10%) and forces the loop to wait through the 36 s deadband.
- Retry budget grows from 60 s to 90 s to comfortably cover the deadband
plus a slot or two of margin.
- Test #2's synthetic `higherMinFees` grows from `lowerMinFees.mul(2)`
to `lowerMinFees.mul(4)`, giving unambiguous headroom over the realized
bumped fee while staying under the 6x default-padding cap so
`txWithDefaultPadding` is still the comparison point.

Test #1's bounds and semantics are unchanged; only the call site is
updated to pass `stableMinFees` as the reference.

## Test plan

- CI `merge-queue-heavy` (10 parallel grind runs of
e2e_fees/fee_settings)
- The PR-branch `ci-full-no-test-cache` already passed at the head
commit; the flake only surfaces under grind

Analysis:
https://gist.github.com/AztecBot/97861b48883eec686f5978a43a2082bb


ClaudeBox log: https://claudebox.work/s/89d3754c8b2b7140?run=1
AztecBot added a commit that referenced this pull request May 16, 2026
Both fail repeatedly on merge-train attempts under proposer pipelining
despite fix attempts (#23303, #23334 for fee_settings; #23336 for
e2e_amm). Skipping in .test_patterns.yml to land the train; to be
triaged and re-enabled (tracking issue assigned to spalladino).
danielntmd pushed a commit to danielntmd/aztec-packages that referenced this pull request Jun 4, 2026
BEGIN_COMMIT_OVERRIDE
refactor(p2p): merge FastTxCollection into TxCollection with sequential
pipeline (AztecProtocol#23245)
refactor(publisher): bundle-level simulate; drop per-action enqueue sims
(AztecProtocol#23165)
refactor(stdlib): remove deprecated RevertCode/TxExecutionResult aliases
(AztecProtocol#23249)
test(e2e): fix race in 'proposer invalidates multiple checkpoints'
(AztecProtocol#23259)
fix: clean up old jobs regardless of pending status (AztecProtocol#23260)
refactor(p2p): remove unused sendBatchRequest (AztecProtocol#23273)
chore(p2p): remove proposal_tx_collector leftovers (AztecProtocol#23276)
feat: slash truncated checkpoint proposals (AztecProtocol#23250)
refactor: remove unused map in attestation pool (AztecProtocol#23284)
chore(p2p): assert last block in checkpoint proposal is correct (AztecProtocol#23274)
refactor(l1-tx-utils): use DateProvider for fail-fast timeout check
(AztecProtocol#23257)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23277)
test(e2e): fix race in broadcasted_invalid_block_proposal_slash under
pipelining (AztecProtocol#23302)
fix(archiver): atomic getter for L2 tips (AztecProtocol#23295)
fix(sequencer): use targetSlot in tryVoteWhenEscapeHatchOpen under
pipelining (AztecProtocol#23296)
fix(world-state): make fork close idempotent for pruned forks (AztecProtocol#23298)
test(e2e): migrate passing tests to proposer pipelining (AztecProtocol#23275)
chore: update dashboard (AztecProtocol#23312)
chore: Revert "feat(sandbox): support proposer pipelining in local
network" (AztecProtocol#23313)
test: slash on bad attestation (AztecProtocol#23184)
feat(slasher): per-slot data-withholding watcher (A-523, A-525) (AztecProtocol#23116)
test(e2e): enable pipelining on e2e debug trace (AztecProtocol#23301)
test(e2e): enable pipelining on l1-to-l2 test (AztecProtocol#23300)
test(e2e): switch fee_settings to organic fee bumps under pipelining
(AztecProtocol#23303)
fix(ci): retry sqlite3mc-wasm download on transient DNS/TLS failures
(AztecProtocol#23333)
test(e2e): wait for real oracle rotation in fee_settings inflate helper
(AztecProtocol#23334)
test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining
(AztecProtocol#23336)
fix(spartan-bench): tolerate older node images in SlasherConfig schema
(AztecProtocol#23351)
fix: interrupt prover jobs in stop (AztecProtocol#23358)
test(e2e): enable pipelining on bot, fees, and avm simulator tests
(AztecProtocol#23329)
feat(sentinel): end-of-epoch evaluation with re-execution outcomes
(AztecProtocol#23286)
feat: slash for invalid checkpoint proposals (AztecProtocol#23270)
fix: fork closure in epoch proving jobs (AztecProtocol#23390)
fix(slasher): anchor watcher scans at archiver synced L2 slot (AztecProtocol#23394)
fix: avoid npm uplink for aztec-up local publishes (AztecProtocol#23396)
test(e2e): ignore benign 'Insufficient valid txs' block-build-failed in
epochs tests (AztecProtocol#23424)
chore: refactor weekly proving test wait (AztecProtocol#23395)
refactor: add fifo set (AztecProtocol#23271)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23327)
fix(p2p): validate BLOCK_TXS in BatchTxRequester (AztecProtocol#23371)
chore(p2p): simplify IBatchRequestTxValidator (AztecProtocol#23373)
feat(sequencer): AutomineSequencer for single-sequencer e2e tests
(AztecProtocol#23354)
fix(prover): wait for previous epoch to be proven (AztecProtocol#23458)
chore: collocate provers (AztecProtocol#23439)
chore: rm staging-ignition (AztecProtocol#23440)
chore: rm unused networks (AztecProtocol#23441)
test(e2e): migrate block_building, multi_validator_node,
publisher_funding, invalid_checkpoint_proposal to pipelining (AztecProtocol#23414)
fix(archiver): reconcile local blocks with L1 checkpoints by block
number (AztecProtocol#23461)
feat: Updated slash conditions on block proposals (AztecProtocol#23466)
test(e2e): migrate HA full test to pipelining (AztecProtocol#23463)
chore: update resource profiles (AztecProtocol#23442)
chore: update debug log levels (AztecProtocol#23456)
test: fix flaky sentinel_status_slash by asserting the fault on the
checkpoint slot (AztecProtocol#23483)
feat(slasher): slash checkpoint equivocation between P2P and L1 (A-980)
(AztecProtocol#23436)
refactor(slasher): rename ATTESTED_DESCENDANT_OF_INVALID ->
PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS (AztecProtocol#23468)
fix: reject block proposals in poisoned slots (AztecProtocol#23411)
fix: retry nargo dep + solc downloads to survive transient DNS drops
(AztecProtocol#23490)
fix: enrich json-rpc tracing (AztecProtocol#23412)
feat: add trace export controls (AztecProtocol#23413)
test(e2e): assert no equivocation offenses in HA full test (AztecProtocol#23496)
test: cover invalid checkpoint proposal slashing (AztecProtocol#23503)
test(e2e): migrate more e2e suites to proposer pipelining (AztecProtocol#23482)
test: flag e2e_slashing_attested_invalid_proposal as flake under
pipelining (AztecProtocol#23501)
test: flag e2e_p2p_duplicate_proposal_slash as flake under pipelining
(AztecProtocol#23515)
test(e2e): require cross-observer agreement on sentinel fault slot
(AztecProtocol#23513)
test: flag e2e_ha_full afterAll hook timeout as flake under pipelining
(AztecProtocol#23524)
fix(e2e): propagate l1ContractsArgs into node config so archiver matches
L1 (AztecProtocol#23514)
test: flag e2e_multi_validator_node_key_store P2P tx-dropped failure as
flake (AztecProtocol#23528)
test(cheat-codes): retry warpL2TimeAtLeastTo in-current-slot test on L1
race (AztecProtocol#23533)
test(e2e_ha_full): parallel HA peer node teardown with per-node deadline
(AztecProtocol#23539)
test: flag e2e_ha_full as flake under HA pipelining (AztecProtocol#23541)
test(ci): skip e2e_ha_full entirely on merge-train/spartan (AztecProtocol#23542)
test(ci): skip e2e_multi_validator_node_key_store entirely on
merge-train/spartan (AztecProtocol#23544)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants