Skip to content

refactor(publisher): bundle-level simulate; drop per-action enqueue sims#23165

Merged
PhilWindle merged 54 commits into
merge-train/spartanfrom
sp/publisher-simulation-cleanup
May 13, 2026
Merged

refactor(publisher): bundle-level simulate; drop per-action enqueue sims#23165
PhilWindle merged 54 commits into
merge-train/spartanfrom
sp/publisher-simulation-cleanup

Conversation

@spalladino

@spalladino spalladino commented May 11, 2026

Copy link
Copy Markdown
Contributor

Context

SequencerPublisher simulates each enqueued L1 action individually at enqueue time, then sends them bundled through Multicall3. The propose checkpoint action is validated at enqueue and send time (the latter via a preCheck mechanism), but in isolation and relying on overrides. There is no simulation of the multicall payload before sending it, so a reverting tx is most likely not caught.

This refactor:

  • Replaces the per-request preCheck mechanism with a single bundle-level eth_simulateV1 of the assembled aggregate3 payload, run right before send. If any entry reverts in sim it is dropped from the bundle, the reduced bundle is re-simulated to get an honest gasUsed, and the survivors are sent. Extracted to a SequencerBundleSimulator.
  • Drops the entire propose simulate at enqueue (simulateProposeTx, validateCheckpointForSubmission). The bundle simulate covers it.
  • Adds a new pre-broadcast validateBlockHeader call (calling validateHeaderWithAttestations with empty attestations + ignoreSignatures: true) that catches header-level bugs before we gossip the proposal to peers. Emits a new header-validation-failed event on failure.
  • Drops every per-action simulate at enqueue (governance signal and slashing votes/executes). Bundle simulate at send time is the single decision point for every per-action revert. simulateAndEnqueueRequest is deleted. We were enqueuing votes even if the simulation failed, after all.
  • Rewrites sendRequestsAt so it takes an L2 SlotNumber, derives the timestamp for the start of that slot, and sleeps until one L1 slot before that boundary, so we can land on the first L1 slot of the target L2 slot.
  • Centralises SimulationOverridesPlan construction into a single buildCheckpointSimulationOverridesPlan helper. The plan always pins both pending and proven chain tips (to the pipelined parent / invalidation target, or to the current snapshot when neither applies), so STFLib.canPruneAtTime cannot reintroduce a phantom prune during simulation.
  • Makes SimulationOverridesBuilder.merge undefined-safe: explicit undefined fields in an incoming plan no longer erase previously-set values. withPendingTempCheckpointLogFields now accepts a partial subset of fields.
  • Moves the payload-empty cache onto GovernanceProposerContract next to its concern. Only isPayloadEmpty=false is cached (a CREATE2 redeploy could go empty → populated).
  • Drops the old Multicall3 revert-recovery and per-request-resim machinery, since with allowFailure: true the top-level multicall is expected to land successfully. Multicall3.forward now throws MulticallForwarderRevertedError if the receipt reports a reverted status; the publisher does not rotate to a new publisher on that error (on-chain failure, not a send failure). Adds Multicall3.hasCode helper and a simulateAggregate3 entrypoint used by the bundle simulator.
  • L1TxUtils.sendTransaction fails fast if txTimeoutAt has already elapsed when called. SequencerPublisher.forwardWithPublisherRotation re-checks the deadline at the head of each rotation iteration so it doesn't keep cycling through publishers after the L2 slot's submission window has closed.
  • Sequencer escape-hatch (voteInSlotWithoutSyncing) and full-escape-hatch (voteOnSlotWithEscapeHatch) vote-only paths now submit via sendRequestsAt(slot) rather than sendRequests(), so the bundle-simulate block.timestamp override matches the slot the EIP-712 vote signatures were generated for.

The intended outcome is a publisher with one explicit re-validation point (the bundle simulate), measurable bundle gas (from the bundle simulate's gasUsed), and dead/duplicated state-override plumbing removed.

Resulting simulations after this refactor

The full list of simulation / gas-estimation steps that remain in a pipelined proposer slot, in execution order.

Pre-build, in Sequencer.doWork

  1. publisher.canProposeAt — rollup view call simulated with the centralised override plan. Cheap pre-check gate before any block-build work.
  2. publisher.simulateInvalidateCheckpoint (conditional) — runs only if syncedTo.pendingChainValidationStatus.valid === false AND !syncedTo.hasProposedCheckpoint. Simulates the invalidate call against the rollup. Result becomes the invalidateCheckpoint package passed into CheckpointProposalJob. The previous code called this even when there's a proposed parent and discarded the result; this refactor adds the !hasProposedCheckpoint gate so we skip the wasted RPC.

Per-slot, in CheckpointProposalJob.proposeCheckpoint

  1. CheckpointVoter votesCheckpointVoter.enqueueVotes() runs at the top of execute(), returning two promises that are awaited in parallel with block-build. It enqueues two kinds of votes via the publisher, neither of which simulates at enqueue time after this refactor:

    • enqueueGovernanceCastSignal — does an isPayloadEmpty pre-flight check (now on GovernanceProposerContract), then enqueues. No eth_simulateV1.
    • enqueueSlashingActions (one call per slashing action, type vote-offenses or execute-slash) — builds the request and enqueues. No eth_simulateV1.

    Real reverts on any of these are caught by the bundle simulate at send time, which drops the failing entry and proceeds with the survivors.

  2. publisher.validateBlockHeader (NEW: pre-broadcast) — replaces the old simulateProposeTx-at-enqueue. Calls validateHeaderWithAttestations with empty attestations and ignoreSignatures: true so the rollup runs the header checks (archive match, slot match, timestamp, mana-min-fee, …) without needing real attestations. Runs before we gossip the proposal to peers. If it fails, abort the slot — log an error, emit header-validation-failed, don't broadcast, don't enqueue.

  3. prepareProposeTx → validateBlobs estimateGas — kept as the blob-commitment consistency check (detects locally-built commitments not matching the blob sidecars). Returns blobEvaluationGas, which we stash on the propose RequestWithExpiry for use by the bundle gasLimit later. The simulate-step that previously paired with this (simulateProposeTx) is removed.

Background pipeline, in waitForAttestationsAndEnqueueSubmissionAsync

  1. publisher.simulateInvalidateCheckpoint (conditional) — runs only in the fallback path where attestation collection failed AND the pending chain turned out to be invalid. Triggered from CheckpointProposalJob.enqueueInvalidation. This is the second, late trigger for invalidation simulation — distinct from step 2's pre-build trigger.

Send time, in sendRequestsAt(targetSlot)

  1. Bundle simulate (NEW) — single eth_simulateV1 of the assembled aggregate3 payload, with block.timestamp overridden to the start of targetSlot, and state overrides = [disableBlobCheck] iff propose is in the bundle and [] otherwise. Per-entry result decoded from the returned Result[]. This is the only post-pipeline-sleep re-validation; it replaces the per-request preCheck mechanism entirely.
  2. Bundle re-simulate (NEW, conditional) — runs only when step 7 dropped at least one entry. Re-runs the bundle simulate on the reduced payload to get an honest gasUsed, and applies the same per-entry decode so additional drops are caught. If the re-simulate falls back (node doesn't support eth_simulateV1), the publisher sends the first-pass survivors only with MAX_L1_TX_LIMIT; the entries that the first pass already proved would revert stay dropped and are reported as failed actions.

Post-send

No diagnostic-only simulate paths remain. Multicall3.forward throws MulticallForwarderRevertedError on a reverted receipt and re-throws on a send error; per-request revert resimulation has been removed.

Known caveats

  • sendRequestsAt early lead: sleeps until startOfTargetSlot - ethereumSlotDuration to maximise inclusion in the first L1 block of the L2 slot. There is a known correctness risk: a tx mined in the L1 block immediately preceding the L2-slot boundary would revert via ProposeLib.validateHeader's slot == block.timestamp.slotFromTimestamp() check. In practice the prior L1 block is usually already committed before this send wakes; if observed to be unreliable in production, tune the lead down, especially on tests.
  • validateBlockHeader pre-broadcast coverage: covers the validateHeader checks (archive, slot/timestamp, mana-min-fee, …) and the empty-attestation path of validateHeaderWithAttestations, but does NOT cover proposer-signature verification, inbox consumption (Rollup__InvalidInHash), or header.inHash match. Those still execute inside the full propose and are caught by the bundle simulate at send time. The cost of a rare miss is one wasted broadcast.
  • Top-level aggregate3 revert diagnostics removed: the previous Multicall3.forward code decoded receipt-reverted reasons via tryGetErrorFromRevertedTx and did a per-request resim on send-throw. Both paths are gone. With allowFailure: true and Multicall3.hasCode covering the no-bytecode case, a reverted forwarder receipt is genuinely unexpected (OOG, forwarder bug). The throw of MulticallForwarderRevertedError is the only diagnostic surface — operators will need the transaction hash from the log to investigate.

spalladino added 29 commits May 11, 2026 18:48
…oseAt

Part of the publisher simulation cleanup. Pure rename + comment; no
behavioural change.
…oWork

When there's a proposed parent checkpoint, the local `invalidateCheckpoint`
result is overwritten to undefined a few lines below, so the simulate is
wasted work. Gate the call behind !hasProposedCheckpoint.
Adds an opt-in `gasLimitRequired` flag on `Multicall3.forward` that throws
if no `gasLimit` is provided in `gasConfig`. The sequencer publisher sets
this flag to true so a missing gasLimit becomes a hard error rather than
falling back to a silent `estimateGas` inside `L1TxUtils.sendTransaction`.
The call is primarily a consistency check that locally-built blob
commitments match the blob data — under the bundle simulate at send
time `eth_simulateV1` cannot carry blob inputs, so the rollup's on-chain
blob check is forced off, making this estimateGas the only pre-flight
detector of a commitment/data mismatch. The gas value remains in use as
a contributor to the propose gasLimit, but the commitment check is the
main motivation.
…ontract

The cache lives on the contract wrapper next to the call it protects, in
line with `Governance.originalPayloadCache`. Removes session-lifetime
state from the publisher.
… attempt

forwardWithPublisherRotation previously trusted any rotated sender to be
solvent; now we query its balance and skip ahead if the worst-case spend
(gas + blob, 2× safety factor) exceeds what the account holds. Catches
the rotation hazard where the new sender can't afford the tx and we'd
only discover the error in the receipt.
…ess callbacks

The publisher's checkSuccess callbacks all differ only in (address, abi,
eventName) — extract a single helper for the status check + event lookup
and use it in addProposeTx, enqueueInvalidateCheckpoint,
enqueueCastSignalHelper, and simulateAndEnqueueRequest (which now takes
an eventOpts config object instead of a callable). Per-site logging stays
at the callsite.
`Hex | string` is overridden by `string` (Hex is a template literal type)
and ESLint flagged the redundancy.
Consolidate the three plan-building sites (Sequencer.doWork's inline
builder, buildPipelinedParentSimulationOverridesPlan,
buildSubmissionSimulationOverridesPlan) into a single
buildCheckpointSimulationOverridesPlan helper. The plan is built once
per slot in CheckpointProposalJob.proposeCheckpoint and threaded down
to the publisher; Sequencer.doWork uses the same helper for the
canProposeAt pre-check with the early-stage inputs it has.
The governance-signal simulate (with enqueue-on-failure) and the
slashing-action simulate (drop-on-failure) are both gone. The bundle
simulate at send time (added in a follow-up commit) is the single point
that decides whether each action makes it on chain — entries that revert
in the bundle simulate are dropped from the bundle and survivors are
sent.

simulateAndEnqueueRequest is deleted; both slashing callsites use a new
enqueueRequest helper that is just dedup + addRequest.
…roadcast header check

simulateProposeTx and validateCheckpointForSubmission are deleted. The
checkpoint header is validated via publisher.validateBlockHeader (with
empty attestations and ignoreSignatures: true) in proposeCheckpoint
*before* the proposal is broadcast to peers — if validation fails the
slot is aborted before any gossip work. State drift between this check
and the eventual L1 send is caught by the bundle simulate at send time
(added in a follow-up commit).

prepareProposeTx now only computes blobEvaluationGas (the
commitment-consistency check) and assembles the propose args; the per-
request gasLimit is left undefined since the bundle simulate sets the
only gasLimit that matters at send time. blobEvaluationGas is stashed
on RequestWithExpiry for the bundle path to read.
sendRequestsAt now takes the target L2 slot instead of a Date. It
computes start-of-slot internally and sleeps until one L1 slot before
that boundary, aiming for inclusion in the first L1 block of the L2
slot.

The per-request preCheck mechanism is removed entirely. The bundle
simulate at send time (added in a follow-up commit) is the single
re-validation point.
…moval

- Drop `proposerAddressForSimulation` field + `setProposerAddressForSimulation`
  setter; no consumer remains after `simulateProposeTx` was deleted. Removes
  the matching fisherman-mode callsite in `Sequencer.doWork`.
- Drop the now-unused `getSimulationTimestamp` private helper and the stale
  `getLastL1SlotTimestampForL2Slot` import.
Replaces per-request preCheck and per-action simulates with a single
eth_simulateV1 of the assembled aggregate3 payload, run right before
the L1 send. Per-entry results are read from the Result[] return; any
entry that reverts in sim is dropped from the bundle (propose failures
included — votes/slashing/invalidate continue regardless). If anything
was dropped the reduced bundle is re-simulated to get an honest gasUsed.

State overrides on the bundle sim are minimal: disableBlobCheck when
propose is in the bundle (eth_simulateV1 can't accept blob inputs),
nothing otherwise. The post-pipeline-sleep view of L1 is the authority
for everything else, by design.

If the node doesn't implement eth_simulateV1, per-entry filtering is
skipped and we send the whole bundle with MAX_L1_TX_LIMIT — same
fallback behaviour as the existing propose simulate path.

MULTICALL_OVERHEAD_GAS_GUESS and VOTE_GAS_GUESS are dropped in a
follow-up commit; the bundle's gasUsed (bumped 64/63 + blobEvaluationGas
if propose survived) is now the authoritative gasLimit.
Both constants are subsumed by the bundle simulate's gasUsed (which
includes Multicall3 overhead naturally) and the per-request gasLimit on
governance/slashing requests (which is now undefined — bundle simulate
sets the only gasLimit that matters).
The inner helper only returns a Promise — ESLint flags `async`-without-`await`.
If sendAndMonitorTransaction threw the tx never landed, so re-simulating
each request only diagnoses on-chain reverts (not mempool/node-level
rejections). Flag for follow-up scoping.
…heck

The refactor introduced a balance check (getSenderBalance + client.getGasPrice)
in forwardWithPublisherRotation before every send attempt, and changed Multicall3.forward
to accept a 7th argument {gasLimitRequired: true}. Update all affected test mocks:

- Add getSenderBalance and client.getGasPrice to l1TxUtils and secondL1TxUtils mocks
- Add bumpGasLimit and simulate to secondL1TxUtils for the rotation beforeEach
- Add the 7th {gasLimitRequired: true} arg to forwardSpy assertions
- Rewrite 'does not send propose tx if rollup validation fails' for bundle-simulate path
- Update 'does not include gas config from expired requests' expectation to MAX_L1_TX_LIMIT
- Fix 'does not signal for payload with empty code' to mock governanceProposerContract.isPayloadEmpty
Two changes in tryVoteWhenSyncFails: it now calls sendRequestsAt(SlotNumber) instead
of sendRequests() (fire-and-forget, scheduled at target slot start). Update assertions
accordingly in both the 'should vote on slashing and governance' and 'should not attempt
to vote twice' tests.

The 'skips L1 check when proposed checkpoint exists' test checked
simulationOverridesPlan.pendingCheckpointState.archive, which was removed: the archive
root is now passed directly as the first arg to canProposeAt rather than embedded in the
plan. Remove that stale assertion.
ESLint flagged Fr and decodeFunctionResult after the bundle-simulate
test rewrites; they were stale.
…late

bundleSimulate previously derived its `time` block override via
getCurrentL2Slot(), but sendRequestsAt now wakes one L1 slot BEFORE the
target L2 slot — so getCurrentL2Slot() returns targetSlot - 1. The
simulate's block.timestamp was therefore for the prior slot, and the
on-chain validateHeader check (slot == slotFromTimestamp(block.timestamp))
would fail every pipelined propose, dropping it from the bundle every
slot.

Pass targetSlot explicitly through sendRequestsAt → sendRequests →
bundleSimulate. Non-pipelined callers of sendRequests() keep the
getCurrentL2Slot() fallback for backwards compatibility.
The centralised buildCheckpointSimulationOverridesPlan dropped the
prunePending.provenOverride handling that the old
buildPipelinedParentSimulationOverridesPlan used to apply via
builder.withChainTips({ proven }). At prune-due epoch-boundary slots
this caused earlySimulationPlan (mana-min-fee globals) and the
pre-broadcast validateBlockHeader to see on-chain proven < pending and
either compute wrong globals or spuriously revert.

Add `provenOverride?: CheckpointNumber` to the helper input and apply
it at all three plan-build sites (sequencer.doWork's canProposeAt
slice, proposeCheckpoint's earlySimulationPlan, and the post-build
checkpointSimulationOverridesPlan).
When the bundle simulate dropped the propose entry from the bundle, the
publisher previously still attached the original blobConfig to the
forwarded transaction — sending a blob-typed L1 tx for non-blob survivors
and wasting blob fees. Defer blobConfig computation to after the bundle
simulate so it operates on the survivor list.
…late

The re-simulate after dropping failed entries previously only read
gasUsed; if the reduced aggregate still contained reverting entries
(e.g. a propose that still fails after invalidate was dropped, or any
non-trivial inter-action dependency), they went through to the wire.

Factor the simulate-and-decode into a helper, run it twice when drops
happen, and abort to empty-bundle if the survivor list becomes empty
after the second pass.
Entries dropped by the bundle simulate previously vanished from the
SendRequestsResult — only a warn log was emitted. Surface them via
failedActions on the result, and call backupFailedTx with failureType:
'simulation' for each dropped entry so the existing failure pipeline
records them.
The original cache treated bytecode existence as immutable, but a
CREATE2 redeploy can take an address from empty to populated. Caching
`true` would then make us keep skipping a payload that later becomes
valid. Only cache the negative (has-code) result.
…Checkpoint

The per-request gasConfig.gasLimit is ignored at send time — the bundle
simulate sets the only gasLimit that matters. Remove the manual 64/63
bump and bumpGasLimit call so the surrounding code reflects reality.
The drop log previously emitted raw returnData hex. Try to decode against
the dropping request's ABI using formatViemError so the warn record
includes a readable revert name (e.g. Rollup__InvalidArchive). Falls
back to raw hex on decode failure.
…aths

Adds three tests exercising the bundleSimulate per-entry decode that
previously had no coverage:
- Both passes drop everything → send is aborted, forward not called.
- Second-pass survivors are sent (mixed-success case).
- Second-pass fallback returns '0x' → MAX_L1_TX_LIMIT used as gasLimit.
@spalladino spalladino added the ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure label May 11, 2026
@spalladino spalladino marked this pull request as ready for review May 12, 2026 05:37
spalladino added 16 commits May 12, 2026 09:58
Extract the abi encoding/decoding and bytecode-presence check that previously
lived in SequencerPublisher.bundleSimulate into reusable static methods on
Multicall3: simulateAggregate3 (with optional fakeSenderBalance state override
for simulating with low-balance senders) and hasCode. Adds anvil-backed unit
tests for both.

SequencerPublisher.bundleSimulate now only contains the bundle-specific bits
(timestamp choice, blob-check override, propose-vs-vote semantics, second-pass
re-simulate).
…eError

Adds a checkBalance flag to L1TxConfig. When true, sendTransaction computes a
worst-case cost from gasLimit, the live gas price, and (for blob txs) blob
gas, and throws InsufficientBalanceError if the sender's balance cannot cover
it. SequencerPublisher's rotation loop now consumes this typed error to
rotate to a new publisher, replacing the inline balance check that used to
live alongside Multicall3.forward.
…mulator

Moves the bundleSimulate logic out of SequencerPublisher into a dedicated
SequencerBundleSimulator class.

- Always uses targetSlot's start timestamp for the simulation block.timestamp
  (no longer minned with predictedNextL1Ts). Known limitation documented.
- New second-pass policy: if any entry reverts in pass 1, re-simulate the
  reduced bundle; if pass 2 surfaces any revert OR returns fallback, abort the
  entire send.
- On first-pass fallback (eth_simulateV1 unsupported), caller sends the bundle
  as-is with MAX_L1_TX_LIMIT.
- multicall3HasCode is now cached only when true, so a transient miss won't
  permanently lock out future sends.
- aborted result carries an explicit reason and only includes truly-reverted
  entries in droppedRequests, so backupFailedTx records aren't misleading.
…point callers

The bundle simulator now runs the propose simulation as part of the multicall,
so the per-checkpoint override plan is no longer threaded through to
enqueueProposeCheckpoint. Remove the now-invalid arg from both callers and
inline the simulation-plan build in CheckpointProposalJob.
…tion

- Revert the checkBalance / InsufficientBalanceError option on L1TxUtils. The
  worst-case calculation was brittle (depends on volatile maxFeePerGas and a
  2x safety multiplier) and only existed for the rotation case below.
- Pre-check txTimeoutAt at the top of L1TxUtils.sendTransaction (before any
  await) so a stale request fails fast without spending time on gas estimation
  or balance lookups.
- In SequencerPublisher.forwardWithPublisherRotation, check txTimeoutAt at the
  head of each loop iteration and bail out if the deadline has passed. This
  keeps the publisher from cycling through every available publisher after the
  L2 slot's submission window has already closed.
… validation failure

The proposer runs validateBlockHeader against L1 state right before broadcasting
a checkpoint proposal. When that simulation fails, we abort the slot — but the
abort was previously invisible to consumers monitoring sequencer events. Emit a
typed 'header-validation-failed' event so e2e tests and observability can react.
Add the new event to the failure-event lists in epochs_test.ts and
escape_hatch_vote_only.test.ts.
…intSimulationOverridesPlan

The decision to apply a proven-tip override (for when a prune would otherwise
fire at the target slot) used to live in sequencer.ts only: a separate
builder-mutation step after buildCheckpointSimulationOverridesPlan returned its
plan. CheckpointProposalJob re-built the plan from a snapshot of that decision,
and never re-queried isPruneDueAtSlot against current state.

Move the override logic inside buildCheckpointSimulationOverridesPlan itself
and have each caller pass:
  - isPruneDueAtSlot (their own freshly-queried answer)
  - checkpointedCheckpointNumber (real on-chain pending tip)
so both call sites honor the live decision against the same input shape and
the override logic exists in exactly one place.

Sequencer reads both from `syncedTo` (no extra archiver query). The job calls
`l2BlockSource.isPruneDueAtSlot(targetSlot)` itself and receives
`checkpointedCheckpointNumber` from the sequencer at construction time.
…op recomputing in sendRequests

Multicall3.forward already computes encodedForwarderData internally to send the
tx. Surface it on each non-thrown return path so the publisher's failure-backup
context does not have to re-encode the same aggregate3 payload from the
survivor list. forwardWithPublisherRotation propagates the field unchanged.
Drop the on-chain revert recovery and per-request re-simulation paths from
Multicall3.forward. The forwarder uses allowFailure: true so the top-level
call is not expected to revert, and a send failure means the tx never
landed -- re-simulating individual entries cannot diagnose mempool-level
rejections. Throw a typed MulticallForwarderRevertedError on reverted
receipts so callers can distinguish on-chain failures (don't rotate) from
send failures (rotate to next publisher).
…lation override

Audits buildCheckpointSimulationOverridesPlan and closes several latent
correctness gaps:

- Always pin both `pending` and `proven` chain tips (drops `isPruneDueAtSlot`):
  either from the override or from the on-chain pending snapshot. This
  short-circuits `STFLib.canPruneAtTime` in simulation and avoids the live
  re-read inside `makeChainTipsOverride` racing against an L1 advance.
- Restore full `tempCheckpointLogs[parent]` overrides per PR #23073:
  headerHash, outHash, slotNumber, payloadDigest, plus archive and fee header,
  so the simulation is byte-faithful with what `propose()` writes.
- Throw on `computePipelinedParentFeeHeader` derivation/RPC failures instead
  of silently returning `undefined`. Only the genesis-grandparent case
  (`checkpointNumber < 2`) legitimately returns `undefined`.
- Enforce mutual exclusion between `proposedCheckpointData` and
  `invalidateToPendingCheckpointNumber`; helper throws if both are supplied.
- Validate `proposedCheckpointData.checkpointNumber === checkpointNumber - 1`
  before applying the pipelined override.
- Thread `signatureContext` through callers so the parent's `payloadDigest`
  can be recomputed.
- `SimulationOverridesBuilder.merge` now uses foundation's `merge`, which
  ignores `undefined` fields so partial plans don't erase prior values.
…te falls back

When the bundle simulator's first pass drops an entry and the second pass
returns fallback (eth_simulateV1 unavailable), the publisher previously
re-sent the original validRequests, reintroducing the known-bad entry.
The bad entry would fail silently inside aggregate3 (allowFailure: true),
but L1 gas was wasted on a multicall that could not succeed for that
action.

Make BundleSimulateResult's fallback variant carry both `requests` and
`droppedRequests` so the simulator can hand back first-pass survivors
when the second pass cannot decode. The publisher's fallback branch now
sends only the survivors, and the first-pass drops are reported as failed
actions for the caller's downstream accounting.
…h them

Move the 'Bundle entry dropped: action reverted in sim' warn out of the
bundle simulator's child logger and onto the SequencerPublisher's own
logger. The simulator now reports drops as structured DroppedRequest
entries (request + revertReason + returnData) and the publisher emits
the warn while iterating them. Restores the call-site that
e2e_l1_publisher.test.ts spies on via jest.spyOn(publisher.log, 'warn').
…en prune is due

The always-pin refactor made the simulation plan pin both chain tips
unconditionally, but the `preparing-checkpoint` event was emitting
`plan.chainTipsOverride.proven` directly, so observers saw a proven
override even when no prune was actually due at the target slot.
Re-introduce the `isPruneDueAtSlot` guard for the event payload so the
field still means "we are building optimistically across a pruning
boundary" rather than "we pinned the snapshot defensively".
The plan always pins both pending/proven tips to short-circuit
`canPruneAtTime` in simulation, so the event now reports the pinned
proven tip unconditionally. The `isPruneDueAtSlot` guard is kept only
for the operational warn log, which still flags "we are building
optimistically across a pruning boundary". Adjusts the unit test and
the epoch proof-at-boundary e2e tests accordingly.
@AztecBot

Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/0ce8a522c4da4fde�0ce8a522c4da4fde8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_invalidate_block.parallel.test.ts "proposer invalidates multiple checkpoints" (451s) (code: 0) group:e2e-p2p-epoch-flakes

// Fail fast before doing any work (gas estimation, balance check) if the caller's deadline
// has already passed. The same check is repeated after gas estimation in case it took long
// enough to push us past the deadline.
if (gasConfigOverrides?.txTimeoutAt && new Date() > gasConfigOverrides.txTimeoutAt) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using the available DateProvider here?

@PhilWindle PhilWindle merged commit a7a0d8c into merge-train/spartan May 13, 2026
14 checks passed
@PhilWindle PhilWindle deleted the sp/publisher-simulation-cleanup branch May 13, 2026 16:33
spalladino added a commit that referenced this pull request May 14, 2026
…23257)

Addresses [Phil's review
comment](#23165 (comment))
on #23165: uses the injected `DateProvider` instead of `new Date()` for
the pre-gas-estimation timeout check in `L1TxUtils.sendTransaction`, so
tests can drive the clock.
PhilWindle pushed a commit that referenced this pull request May 16, 2026
…23303)

## Motivation

Under proposer pipelining, a sequencer builds slot N's checkpoint header
(and bakes `manaMinFee` into `gasFees.feePerL2Gas`) during slot N-1. If
governance executes `setProvingCostPerMana` or `updateManaTarget`
between that build and the L1 submission, L1 recomputes `manaMinFee`
from the post-mutation `FeeStore.config` and the submitted header
reverts with `Rollup__InvalidManaMinFee`. The `e2e_fees/fee_settings`
suite used `bumpProvingCostPerMana` — exactly that governance path — as
its fee-spike mechanism, which made it hostile to pipelining and didn't
reflect any organic mainnet fee channel. The publisher's
bundle-simulator drop log also stopped decoding revert payloads in PR
#23165, leaving operators staring at raw `0x...` data.

## Approach

Drive the fee spike via an L1 base-fee bump (`setNextBlockBaseFeePerGas`
+ `updateL1GasFeeOracle`) — the dominant `feePerL2Gas` channel and a
closer analogue to organic mainnet behaviour. Enable pipelining for the
suite via the `FeesTest` constructor (`enableProposerPipelining`,
`inboxLag`, `manaTarget`, `walletMinFeePadding`, etc.). Add an explicit
recovery test that bumps governance and asserts the chain advances past
the invalidated checkpoint and that a fresh tx still mines. Restore
decoded revert names in `logDroppedInSim` by merging the relevant ABIs
and routing through a shared `tryDecodeRevertReason` helper.

## Changes

- **end-to-end (tests)**: Rewrite `fee_settings.test.ts` to run under
pipelining, replace the governance fee spike with an organic L1-base-fee
bump, and add a recovery test for the governance-mutation race.
- **ethereum**: Add `tryDecodeRevertReason(data, abi)` in `utils.ts` and
route `Multicall3` through it (deduplicating the existing in-place
decoder).
- **sequencer-client**: In `logDroppedInSim`, decode unknown revert
payloads against a merged `[RollupAbi, SlashingProposerAbi,
EmpireBaseAbi, ErrorsAbi]` ABI and emit both the readable form and the
raw payload.

Fixes A-1057
danielntmd pushed a commit to danielntmd/aztec-packages that referenced this pull request Jun 4, 2026
BEGIN_COMMIT_OVERRIDE
refactor(p2p): merge FastTxCollection into TxCollection with sequential
pipeline (AztecProtocol#23245)
refactor(publisher): bundle-level simulate; drop per-action enqueue sims
(AztecProtocol#23165)
refactor(stdlib): remove deprecated RevertCode/TxExecutionResult aliases
(AztecProtocol#23249)
test(e2e): fix race in 'proposer invalidates multiple checkpoints'
(AztecProtocol#23259)
fix: clean up old jobs regardless of pending status (AztecProtocol#23260)
refactor(p2p): remove unused sendBatchRequest (AztecProtocol#23273)
chore(p2p): remove proposal_tx_collector leftovers (AztecProtocol#23276)
feat: slash truncated checkpoint proposals (AztecProtocol#23250)
refactor: remove unused map in attestation pool (AztecProtocol#23284)
chore(p2p): assert last block in checkpoint proposal is correct (AztecProtocol#23274)
refactor(l1-tx-utils): use DateProvider for fail-fast timeout check
(AztecProtocol#23257)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23277)
test(e2e): fix race in broadcasted_invalid_block_proposal_slash under
pipelining (AztecProtocol#23302)
fix(archiver): atomic getter for L2 tips (AztecProtocol#23295)
fix(sequencer): use targetSlot in tryVoteWhenEscapeHatchOpen under
pipelining (AztecProtocol#23296)
fix(world-state): make fork close idempotent for pruned forks (AztecProtocol#23298)
test(e2e): migrate passing tests to proposer pipelining (AztecProtocol#23275)
chore: update dashboard (AztecProtocol#23312)
chore: Revert "feat(sandbox): support proposer pipelining in local
network" (AztecProtocol#23313)
test: slash on bad attestation (AztecProtocol#23184)
feat(slasher): per-slot data-withholding watcher (A-523, A-525) (AztecProtocol#23116)
test(e2e): enable pipelining on e2e debug trace (AztecProtocol#23301)
test(e2e): enable pipelining on l1-to-l2 test (AztecProtocol#23300)
test(e2e): switch fee_settings to organic fee bumps under pipelining
(AztecProtocol#23303)
fix(ci): retry sqlite3mc-wasm download on transient DNS/TLS failures
(AztecProtocol#23333)
test(e2e): wait for real oracle rotation in fee_settings inflate helper
(AztecProtocol#23334)
test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining
(AztecProtocol#23336)
fix(spartan-bench): tolerate older node images in SlasherConfig schema
(AztecProtocol#23351)
fix: interrupt prover jobs in stop (AztecProtocol#23358)
test(e2e): enable pipelining on bot, fees, and avm simulator tests
(AztecProtocol#23329)
feat(sentinel): end-of-epoch evaluation with re-execution outcomes
(AztecProtocol#23286)
feat: slash for invalid checkpoint proposals (AztecProtocol#23270)
fix: fork closure in epoch proving jobs (AztecProtocol#23390)
fix(slasher): anchor watcher scans at archiver synced L2 slot (AztecProtocol#23394)
fix: avoid npm uplink for aztec-up local publishes (AztecProtocol#23396)
test(e2e): ignore benign 'Insufficient valid txs' block-build-failed in
epochs tests (AztecProtocol#23424)
chore: refactor weekly proving test wait (AztecProtocol#23395)
refactor: add fifo set (AztecProtocol#23271)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23327)
fix(p2p): validate BLOCK_TXS in BatchTxRequester (AztecProtocol#23371)
chore(p2p): simplify IBatchRequestTxValidator (AztecProtocol#23373)
feat(sequencer): AutomineSequencer for single-sequencer e2e tests
(AztecProtocol#23354)
fix(prover): wait for previous epoch to be proven (AztecProtocol#23458)
chore: collocate provers (AztecProtocol#23439)
chore: rm staging-ignition (AztecProtocol#23440)
chore: rm unused networks (AztecProtocol#23441)
test(e2e): migrate block_building, multi_validator_node,
publisher_funding, invalid_checkpoint_proposal to pipelining (AztecProtocol#23414)
fix(archiver): reconcile local blocks with L1 checkpoints by block
number (AztecProtocol#23461)
feat: Updated slash conditions on block proposals (AztecProtocol#23466)
test(e2e): migrate HA full test to pipelining (AztecProtocol#23463)
chore: update resource profiles (AztecProtocol#23442)
chore: update debug log levels (AztecProtocol#23456)
test: fix flaky sentinel_status_slash by asserting the fault on the
checkpoint slot (AztecProtocol#23483)
feat(slasher): slash checkpoint equivocation between P2P and L1 (A-980)
(AztecProtocol#23436)
refactor(slasher): rename ATTESTED_DESCENDANT_OF_INVALID ->
PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS (AztecProtocol#23468)
fix: reject block proposals in poisoned slots (AztecProtocol#23411)
fix: retry nargo dep + solc downloads to survive transient DNS drops
(AztecProtocol#23490)
fix: enrich json-rpc tracing (AztecProtocol#23412)
feat: add trace export controls (AztecProtocol#23413)
test(e2e): assert no equivocation offenses in HA full test (AztecProtocol#23496)
test: cover invalid checkpoint proposal slashing (AztecProtocol#23503)
test(e2e): migrate more e2e suites to proposer pipelining (AztecProtocol#23482)
test: flag e2e_slashing_attested_invalid_proposal as flake under
pipelining (AztecProtocol#23501)
test: flag e2e_p2p_duplicate_proposal_slash as flake under pipelining
(AztecProtocol#23515)
test(e2e): require cross-observer agreement on sentinel fault slot
(AztecProtocol#23513)
test: flag e2e_ha_full afterAll hook timeout as flake under pipelining
(AztecProtocol#23524)
fix(e2e): propagate l1ContractsArgs into node config so archiver matches
L1 (AztecProtocol#23514)
test: flag e2e_multi_validator_node_key_store P2P tx-dropped failure as
flake (AztecProtocol#23528)
test(cheat-codes): retry warpL2TimeAtLeastTo in-current-slot test on L1
race (AztecProtocol#23533)
test(e2e_ha_full): parallel HA peer node teardown with per-node deadline
(AztecProtocol#23539)
test: flag e2e_ha_full as flake under HA pipelining (AztecProtocol#23541)
test(ci): skip e2e_ha_full entirely on merge-train/spartan (AztecProtocol#23542)
test(ci): skip e2e_multi_validator_node_key_store entirely on
merge-train/spartan (AztecProtocol#23544)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants