test(sequencer): de-flake "stops rotating once txTimeoutAt elapses" by AztecBot · Pull Request #23287 · AztecProtocol/aztec-packages

AztecBot · 2026-05-14T17:13:56Z

Summary

Fixes a flaky unit test that broke the merge-train/spartan CI run after #23284 merged (the failure is unrelated to that PR — it just happened to land on top of a timing-flaky test).

Test: SequencerPublisher > publisher rotation on send failure > stops rotating once txTimeoutAt elapses mid-rotation in yarn-project/sequencer-client/src/publisher/sequencer-publisher.test.ts.

Root cause

The test used a 100ms-in-the-future deadline and a 200ms-real-sleep inside the forward mock, expecting the rotation loop to make one forward call before bailing on the second iteration. Under CI load, JS scheduling between Date.now() + 100 and the first deadline check inside the loop can exceed 100ms — the loop then sees the deadline already elapsed and bails before any forward call, leaving forwardSpy with 0 calls instead of 1.

CI log excerpt confirming this (the triedAddresses:[] array means no forward attempt was made):

17:02:04.429 WARN sequencer:publisher Tx timeout (2026-05-14T17:02:03.349Z) elapsed;
  stopping publisher rotation triedAddresses:[]

Fix

Switch the test to jest.useFakeTimers({ doNotFake: [...] }), faking only Date (timers/microtasks left real so the async flow still works). Inside the first forward mock, advance the fake clock past the deadline with jest.setSystemTime, then reject. The deadline comparison in the rotation loop now reads a fully-controlled value and the test outcome no longer depends on CI scheduling jitter.

CI failure log: http://ci.aztec-labs.com/1778776975195828

Detailed analysis: https://gist.github.com/AztecBot/cc4fd52fc84d4347a931379e371620fe

Test plan

The test now uses a controlled fake clock; the deadline elapses exactly when the first forward mock advances setSystemTime past txTimeoutAt, regardless of CI load.
jest.useRealTimers() is restored in finally so the change is fully scoped to this one test.
Local ./bootstrap.sh ci could not be run in the container (Docker socket and ~/.parallel/cargo-binstall permission limitations — known sandbox constraints); merge-train CI will validate.

ClaudeBox log: https://claudebox.work/s/0114271655393104?run=1

…al pipeline (#23245) ## Summary - Removes `FastTxCollection` as a separate class and absorbs all its logic directly into `TxCollection` - Replaces the old parallel file-store delay with a single sequential pipeline: node RPC → reqresp → file store, where each phase blocks on the previous (cancellation-aware) - File store collection is now driven by `IRequestTracker` — the same synchronization primitive used by node and reqresp paths. The tracker is the single source of truth for "is this tx still missing?" and "is this request still alive?" - `FileStoreTxCollection` simplified: dropped `start()`/`stop()`/persistent worker pool/`wakeSignal`. `startCollecting(requestTracker, context)` returns `Promise<void>`, spins up its own per-call worker pool, and workers self-terminate when the tracker is cancelled (all-fetched / deadline / external) ## Collection flow inside `collectFast` 1. Start node RPC collection in the background 2. Wait `txCollectionFastNodesTimeoutBeforeReqRespMs` — interruptible by cancellation **or by node exhaustion** (so when no nodes are configured, reqresp starts immediately) 3. Start reqresp in the background (parallel with nodes) 4. Wait `txCollectionFileStoreFastDelayMs` — interruptible by cancellation or reqresp completion 5. Start file store collection in the background (its workers self-terminate) 6. `Promise.allSettled` on node + reqresp + file store `txCollectionFileStoreFastDelayMs` description updated to reflect it is now anchored to reqresp start, not collection start. ## File store / tracker integration - `FileStoreTxCollection.startCollecting` no longer takes `(txHashes, context, deadline)`; it takes `(requestTracker, context)` and reads the missing txs + deadline from the tracker - Workers check `requestTracker.isMissing(hash)` each scan — if the tx was found via another path (node/reqresp/gossipsub), the entry is dropped without an extra fetch - Workers race their backoff sleeps against `requestTracker.cancellationToken` — cancelling a request (deadline, `stopCollectingForBlocksUpTo/After`, or `stop()`) propagates to file store workers immediately - Removed `foundTxs`/`clearPending` plumbing on `FileStoreTxCollection` — the tracker handles both implicitly - `startCollecting` yields once after building its entry set, so a synchronous follow-up call (e.g. `markFetched` in tests, or the gossipsub-found path in production) lands before workers begin scanning ## Tests - `tx_collection.test.ts`: collapsed the `TestFastTxCollection` subclass; all accesses go directly through `TxCollection`. Added "starts reqresp immediately when no nodes are configured" covering the node-exhaustion shortcut - `file_store_tx_collection.test.ts`: rewritten for the new shape — no `start()`/`stop()`, lifecycle driven by the tracker (cancel to terminate workers). New "workers exit when tracker is cancelled" covers the per-call worker-pool teardown Closes https://linear.app/aztec-labs/issue/A-933/tx-collection-dont-retrieve-transactions-that-have-already-been via new synchronization with the request tracker.

…ims (#23165) ## Context `SequencerPublisher` simulates each enqueued L1 action individually at enqueue time, then sends them bundled through Multicall3. The `propose` checkpoint action is validated at enqueue and send time (the latter via a `preCheck` mechanism), but in isolation and relying on overrides. There is no simulation of the multicall payload before sending it, so a reverting tx is most likely not caught. This refactor: - Replaces the per-request `preCheck` mechanism with a **single bundle-level `eth_simulateV1`** of the assembled `aggregate3` payload, run right before send. If any entry reverts in sim it is dropped from the bundle, the reduced bundle is re-simulated to get an honest `gasUsed`, and the survivors are sent. Extracted to a `SequencerBundleSimulator`. - Drops the entire propose simulate at enqueue (`simulateProposeTx`, `validateCheckpointForSubmission`). The bundle simulate covers it. - Adds a new pre-broadcast `validateBlockHeader` call (calling `validateHeaderWithAttestations` with empty attestations + `ignoreSignatures: true`) that catches header-level bugs before we gossip the proposal to peers. Emits a new `header-validation-failed` event on failure. - Drops every per-action simulate at enqueue (governance signal **and** slashing votes/executes). Bundle simulate at send time is the single decision point for every per-action revert. `simulateAndEnqueueRequest` is deleted. We were enqueuing votes even if the simulation failed, after all. - Rewrites `sendRequestsAt` so it takes an L2 `SlotNumber`, derives the timestamp for the start of that slot, and sleeps until one L1 slot before that boundary, so we can land on the first L1 slot of the target L2 slot. - Centralises `SimulationOverridesPlan` construction into a single `buildCheckpointSimulationOverridesPlan` helper. The plan **always** pins both `pending` and `proven` chain tips (to the pipelined parent / invalidation target, or to the current snapshot when neither applies), so `STFLib.canPruneAtTime` cannot reintroduce a phantom prune during simulation. - Makes `SimulationOverridesBuilder.merge` undefined-safe: explicit `undefined` fields in an incoming plan no longer erase previously-set values. `withPendingTempCheckpointLogFields` now accepts a partial subset of fields. - Moves the payload-empty cache onto `GovernanceProposerContract` next to its concern. Only `isPayloadEmpty=false` is cached (a CREATE2 redeploy could go empty → populated). - Drops the old Multicall3 revert-recovery and per-request-resim machinery, since with `allowFailure: true` the top-level multicall is expected to land successfully. `Multicall3.forward` now throws `MulticallForwarderRevertedError` if the receipt reports a reverted status; the publisher does **not** rotate to a new publisher on that error (on-chain failure, not a send failure). Adds `Multicall3.hasCode` helper and a `simulateAggregate3` entrypoint used by the bundle simulator. - `L1TxUtils.sendTransaction` fails fast if `txTimeoutAt` has already elapsed when called. `SequencerPublisher.forwardWithPublisherRotation` re-checks the deadline at the head of each rotation iteration so it doesn't keep cycling through publishers after the L2 slot's submission window has closed. - Sequencer escape-hatch (`voteInSlotWithoutSyncing`) and full-escape-hatch (`voteOnSlotWithEscapeHatch`) vote-only paths now submit via `sendRequestsAt(slot)` rather than `sendRequests()`, so the bundle-simulate `block.timestamp` override matches the slot the EIP-712 vote signatures were generated for. The intended outcome is a publisher with one explicit re-validation point (the bundle simulate), measurable bundle gas (from the bundle simulate's `gasUsed`), and dead/duplicated state-override plumbing removed. ## Resulting simulations after this refactor The full list of simulation / gas-estimation steps that remain in a pipelined proposer slot, in execution order. ### Pre-build, in `Sequencer.doWork` 1. **`publisher.canProposeAt`** — rollup view call simulated with the centralised override plan. Cheap pre-check gate before any block-build work. 2. **`publisher.simulateInvalidateCheckpoint`** (conditional) — runs **only** if `syncedTo.pendingChainValidationStatus.valid === false` AND `!syncedTo.hasProposedCheckpoint`. Simulates the invalidate call against the rollup. Result becomes the `invalidateCheckpoint` package passed into `CheckpointProposalJob`. The previous code called this even when there's a proposed parent and discarded the result; this refactor adds the `!hasProposedCheckpoint` gate so we skip the wasted RPC. ### Per-slot, in `CheckpointProposalJob.proposeCheckpoint` 3. **CheckpointVoter votes** — `CheckpointVoter.enqueueVotes()` runs at the top of `execute()`, returning two promises that are awaited in parallel with block-build. It enqueues two kinds of votes via the publisher, **neither of which simulates at enqueue time** after this refactor: - **`enqueueGovernanceCastSignal`** — does an `isPayloadEmpty` pre-flight check (now on `GovernanceProposerContract`), then enqueues. No `eth_simulateV1`. - **`enqueueSlashingActions`** (one call per slashing action, type `vote-offenses` or `execute-slash`) — builds the request and enqueues. No `eth_simulateV1`. Real reverts on any of these are caught by the bundle simulate at send time, which drops the failing entry and proceeds with the survivors. 4. **`publisher.validateBlockHeader` (NEW: pre-broadcast)** — replaces the old `simulateProposeTx`-at-enqueue. Calls `validateHeaderWithAttestations` with empty attestations and `ignoreSignatures: true` so the rollup runs the header checks (archive match, slot match, timestamp, mana-min-fee, …) without needing real attestations. Runs **before** we gossip the proposal to peers. If it fails, abort the slot — log an error, emit `header-validation-failed`, don't broadcast, don't enqueue. 5. **`prepareProposeTx → validateBlobs estimateGas`** — kept as the blob-commitment **consistency check** (detects locally-built commitments not matching the blob sidecars). Returns `blobEvaluationGas`, which we stash on the propose `RequestWithExpiry` for use by the bundle gasLimit later. The simulate-step that previously paired with this (`simulateProposeTx`) is removed. ### Background pipeline, in `waitForAttestationsAndEnqueueSubmissionAsync` 6. **`publisher.simulateInvalidateCheckpoint` (conditional)** — runs **only** in the fallback path where attestation collection failed AND the pending chain turned out to be invalid. Triggered from `CheckpointProposalJob.enqueueInvalidation`. This is the second, late trigger for invalidation simulation — distinct from step 2's pre-build trigger. ### Send time, in `sendRequestsAt(targetSlot)` 7. **Bundle simulate (NEW)** — single `eth_simulateV1` of the assembled `aggregate3` payload, with `block.timestamp` overridden to the start of `targetSlot`, and state overrides = `[disableBlobCheck]` iff `propose` is in the bundle and `[]` otherwise. Per-entry result decoded from the returned `Result[]`. This is the **only** post-pipeline-sleep re-validation; it replaces the per-request `preCheck` mechanism entirely. 8. **Bundle re-simulate (NEW, conditional)** — runs **only** when step 7 dropped at least one entry. Re-runs the bundle simulate on the reduced payload to get an honest `gasUsed`, and applies the same per-entry decode so additional drops are caught. If the re-simulate falls back (node doesn't support `eth_simulateV1`), the publisher sends the **first-pass survivors only** with `MAX_L1_TX_LIMIT`; the entries that the first pass already proved would revert stay dropped and are reported as failed actions. ### Post-send No diagnostic-only simulate paths remain. `Multicall3.forward` throws `MulticallForwarderRevertedError` on a reverted receipt and re-throws on a send error; per-request revert resimulation has been removed. ## Known caveats - **`sendRequestsAt` early lead**: sleeps until `startOfTargetSlot - ethereumSlotDuration` to maximise inclusion in the first L1 block of the L2 slot. There is a known correctness risk: a tx mined in the L1 block immediately preceding the L2-slot boundary would revert via `ProposeLib.validateHeader`'s `slot == block.timestamp.slotFromTimestamp()` check. In practice the prior L1 block is usually already committed before this send wakes; if observed to be unreliable in production, tune the lead down, especially on tests. - **`validateBlockHeader` pre-broadcast coverage**: covers the `validateHeader` checks (archive, slot/timestamp, mana-min-fee, …) and the empty-attestation path of `validateHeaderWithAttestations`, but does NOT cover proposer-signature verification, inbox consumption (`Rollup__InvalidInHash`), or `header.inHash` match. Those still execute inside the full `propose` and are caught by the bundle simulate at send time. The cost of a rare miss is one wasted broadcast. - **Top-level `aggregate3` revert diagnostics removed**: the previous `Multicall3.forward` code decoded receipt-reverted reasons via `tryGetErrorFromRevertedTx` and did a per-request resim on send-throw. Both paths are gone. With `allowFailure: true` and `Multicall3.hasCode` covering the no-bytecode case, a reverted forwarder receipt is genuinely unexpected (OOG, forwarder bug). The throw of `MulticallForwarderRevertedError` is the only diagnostic surface — operators will need the transaction hash from the log to investigate.

…ses (#23249) ## Motivation The `RevertCode` and `TxExecutionResult` types each carried three deprecated aliases (`APP_LOGIC_REVERTED`, `TEARDOWN_REVERTED`, `BOTH_REVERTED`) that all collapse to the same `REVERTED` value. Keeping them around adds noise, requires `no-duplicate-enum-values` eslint suppressions, and lets new code keep reaching for the old names. ## Approach Removed the deprecated members from both enums and rewrote every call site to use `REVERTED` directly. Tests, fixtures, and a stale doc reference were updated to match. ## Changes - **stdlib**: Drop deprecated `APP_LOGIC_REVERTED`/`TEARDOWN_REVERTED`/`BOTH_REVERTED` from `RevertCode` and `TxExecutionResult`. - **simulator, pxe, aztec.js, end-to-end (tests)**: Replace remaining references with `REVERTED`. - **simulator/docs**: Update a stale `APP_LOGIC_REVERTED` reference in the public-tx-simulation doc.

…23259) Addresses a config-timing race in `epochs_invalidate_block.parallel.test.ts > "proposer invalidates multiple checkpoints"` that caused intermittent CI failures with `expect(validCount).toBeLessThan(quorum)` (e.g. 5/6 attestations when quorum=5). ## The race The test reads `currentSlot` via `monitor.run()` right after waiting for the first checkpoint to land — that read can land anywhere within the current L2 slot, including near its end. It then computes `badSlot1 = currentSlot + 2` and races to push malicious config (`skipCollectingAttestations: true`, …) to that slot's proposer via `await node.setConfig({...})`. `CheckpointProposalJob` is constructed with `this.config` passed by reference (`sequencer-client/src/sequencer/sequencer.ts:559`), and `Sequencer.updateConfig` reassigns `this.config = merge(...)` rather than mutating, so a job built before `setConfig` lands keeps the old config object. Under proposer pipelining (`PROPOSER_PIPELINING_SLOT_OFFSET = 1`, `epoch-cache/src/epoch_cache.ts:26`), the job for `badSlot1` is built during the last L1 slot of L2 slot `badSlot1 - 1`. With 32s L2 slots and 8s L1 slots, that's ~24s into the previous L2 slot — so if `currentSlot` was read late, badSlot1's proposer can snapshot the old config before our `setConfig` round-trip completes. ## Fix - Wait for an L2 slot boundary (`monitor.waitUntilNextL2Slot()`) before reading `currentSlot`, so we start from the beginning of a slot rather than wherever we happened to land. - Bump the gap from `+2/+3` to `+3/+4` for a second slot of margin. Cost is up to one additional L2 slot of test runtime in the worst case; the existing 8-slot wait window for both checkpoints still fits.

.

`sendBatchRequest` became unused after removing the slow tx flow and the old tx reqresp method. This PR removes sendBatchRequest and cleans up code that becomes unused. It does NOT remove subprotocol validator registration/etc from reqresp. This might be done in a follow-up depending on how https://linear.app/aztec-labs/issue/A-1014/block-txs-reqresp-validator-validaterequestedblocktxs-is-never-invoked becomes solved.

ProposalTxCollector doesn't exist anymore. Clean up unused files. Rename bench that is now only testing BatchTxRequester.

Fix A-520

Remove the single-checkpoint-proposal map in favour of the "by hash" variant.

AztecBot · 2026-05-22T07:20:39Z

Automatically closing this stale claudebox draft PR (no updates for 5+ days). Re-open if still needed.

fcarreiro and others added 13 commits May 13, 2026 17:31

Merge branch 'next' into merge-train/spartan

4b52402

Merge branch 'next' into merge-train/spartan

ad97697

Merge branch 'next' into merge-train/spartan

1eaca4b

fix: clean up old jobs regardless of pending status (#23260)

437eacb

.

chore(p2p): remove proposal_tx_collector leftovers (#23276)

6eedaa5

ProposalTxCollector doesn't exist anymore. Clean up unused files. Rename bench that is now only testing BatchTxRequester.

feat: slash truncated checkpoint proposals (#23250)

816aef3

Fix A-520

refactor: remove unused map in attestation pool (#23284)

55df962

Remove the single-checkpoint-proposal map in favour of the "by hash" variant.

test(sequencer): de-flake "stops rotating once txTimeoutAt elapses"

4d7fc07

AztecBot added ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels May 14, 2026

ludamad force-pushed the merge-train/spartan branch from 4af2626 to db4ec58 Compare May 16, 2026 19:07

AztecBot closed this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(sequencer): de-flake "stops rotating once txTimeoutAt elapses"#23287

test(sequencer): de-flake "stops rotating once txTimeoutAt elapses"#23287
AztecBot wants to merge 13 commits into
merge-train/spartanfrom
claudebox/fix-spartan-attestation-pool

AztecBot commented May 14, 2026

Uh oh!

AztecBot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

AztecBot commented May 14, 2026

Summary

Root cause

Fix

Test plan

Uh oh!

AztecBot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants