chore: deflake duplicate proposals and attestations by mrzeszutko · Pull Request #20990 · AztecProtocol/aztec-packages

mrzeszutko · 2026-03-02T09:56:00Z

Fix flakiness in duplicate_proposal_slash and duplicate_attestation_slash e2e tests.

Both tests flake at a 3-13% rate because the malicious proposer (1 of 4 validators) is never selected within the timeout window. Three root causes are addressed:

GossipSub mesh checked only for TX topic — The tests need block proposals and checkpoint proposals to propagate, but waitForP2PMeshConnectivity only verified the tx topic mesh. Added a topics parameter so callers can specify which topics to wait on. Both slash tests now wait on tx, block_proposal, and checkpoint_proposal.
Proposer selection is probabilistic — With 4 validators and 2 slots/epoch, the malicious proposer has ~25% chance per slot. Added awaitEpochWithProposer helper that advances epochs (via L1 time warp) until the target proposer is deterministically selected for at least one slot in the current epoch.
Race between node startup and first proposal — Nodes started sequencing immediately upon creation, potentially proposing before the P2P mesh was ready. Now all nodes are created with dontStartSequencer: true, and sequencers are started simultaneously only after mesh formation, committee existence, and epoch advancement are confirmed.

Fixes A-593
Fixes A-595

spalladino

Have you confirmed the call to waitForP2PMeshConnectivity is needed? Both tests run with mockGossipSubNetwork, so connectivity should be guaranteed. Did you stumble upon runs where the issue was due to missed p2p messages? If so, we should review if mockGossipSubNetwork is working properly. If not, then we should remove the call to waitForP2PMeshConnectivity.

spalladino · 2026-03-02T13:55:11Z

+    logger.info(`Checking epoch ${currentEpoch} (slots ${startSlot}-${endSlot - 1}) for proposer ${targetProposer}`);
+
+    for (let s = startSlot; s < endSlot; s++) {
+      const proposer = await epochCache.getProposerAttesterAddressInSlot(SlotNumber(s));


Nit: let's switch this in favor of getCurrentProposer (see ethereum/src/contracts/rollup.ts method getCurrentProposer) so we don't need an epoch cache as a dependency, and get the data directly from L1.

mrzeszutko · 2026-03-02T14:10:31Z

Have you confirmed the call to waitForP2PMeshConnectivity is needed? Both tests run with mockGossipSubNetwork, so connectivity should be guaranteed. Did you stumble upon runs where the issue was due to missed p2p messages? If so, we should review if mockGossipSubNetwork is working properly. If not, then we should remove the call to waitForP2PMeshConnectivity.

@spalladino I didn't - I actually assumed it is there for a reason (it was already there, I did not add it). I'll double check if that's needed

spalladino · 2026-03-02T14:15:36Z

Hah, fair! I didn't notice it was already there. I'm good to merge then, but still worth taking a look at that.

BEGIN_COMMIT_OVERRIDE fix: track last seen nonce in case of stale fallback L1 RPC node (#20855) feat: Validate num txs in block proposals (#20850) fix(archiver): enforce checkpoint boundary on rollbackTo (#20908) fix: tps zero metrics (#20656) fix: handle scientific notation in bigintConfigHelper (#20929) feat(aztec): node enters standby mode on genesis root mismatch (#20938) fix: logging of class instances (#20807) feat(slasher): make slash grace period relative to rollup upgrade time (#20942) chore: add script to find PRs to backport (#20956) chore: remove unused prover-node dep (#20955) fix: increase minFeePadding in e2e_bot bridge resume tests and harden GasFees.mul() (#20962) feat(sequencer): (A-526) rotate publishers when send fails (#20888) chore: (A-554) bump reth version 1.6.0 -> 1.11.1 for eth devnet (#20889) chore: metric on how many epochs validator has been on committee (#20967) fix: set wallet minFeePadding in BotFactory constructor (#20992) chore: deflake epoch invalidate block test (#21001) chore(sequencer): e2e tests for invalid signature recovery in checkpoint attestations (#20971) chore: deflake duplicate proposals and attestations (#20990) chore: deflake epochs mbps test (#21003) feat: reenable function selectors in txPublicSetupAllowList (#20909) fix: limit offenses when voting in tally slashing mode by slashMaxPayloadSize (#20683) fix(spartan): wire SEQ_L1_PUBLISHING_TIME_ALLOWANCE_IN_SLOT env var (#21017) END_COMMIT_OVERRIDE

Fix flakiness in duplicate_proposal_slash and duplicate_attestation_slash e2e tests. Both tests flake at a 3-13% rate because the malicious proposer (1 of 4 validators) is never selected within the timeout window. Three root causes are addressed: 1. **GossipSub mesh checked only for TX topic** — The tests need block proposals and checkpoint proposals to propagate, but `waitForP2PMeshConnectivity` only verified the `tx` topic mesh. Added a `topics` parameter so callers can specify which topics to wait on. Both slash tests now wait on `tx`, `block_proposal`, and `checkpoint_proposal`. 2. **Proposer selection is probabilistic** — With 4 validators and 2 slots/epoch, the malicious proposer has ~25% chance per slot. Added `awaitEpochWithProposer` helper that advances epochs (via L1 time warp) until the target proposer is deterministically selected for at least one slot in the current epoch. 3. **Race between node startup and first proposal** — Nodes started sequencing immediately upon creation, potentially proposing before the P2P mesh was ready. Now all nodes are created with `dontStartSequencer: true`, and sequencers are started simultaneously only after mesh formation, committee existence, and epoch advancement are confirmed. Fixes A-593 Fixes A-595

## Summary Fixes a race condition in the `duplicate_attestation_slash` and `duplicate_proposal_slash` e2e tests that caused intermittent timeouts waiting for slashing offenses to be detected. ## Root Cause Investigated CI failure [2abee794200ae6a7](http://ci.aztec-labs.com/2abee794200ae6a7) (commit `f71c235a670d`, merge queue for PR #20555). This run **did include** the fix from #20990, yet the `duplicate_attestation_slash` test still failed with a timeout after ~478 seconds. The failure sequence from the logs: 1. `awaitEpochWithProposer` found the malicious proposer at **slot 14** (epoch 7) 2. The function returned, having warped L1 time to the start of epoch 7 3. Sequencers were then started (`await Promise.all(nodes.map(n => n.getSequencer()!.start()))`) 4. By the time sequencers were ready to build blocks, L1 time had advanced past epoch 7 into **epoch 8** (first block built at slot 16) 5. The malicious proposer was never selected in epoch 8+, so no duplicate proposals/attestations were created 6. The slasher had nothing to detect, and the test timed out The core issue: `awaitEpochWithProposer` warped to the target epoch and returned, but starting sequencers takes real time. With only 2 slots per epoch (48s total), the epoch passed before sequencers could act. ## Fix Renamed `awaitEpochWithProposer` to `advanceToEpochBeforeProposer` and changed the approach to a two-phase pattern: 1. **Find phase**: The function now checks the **next** epoch's slots (N+1) instead of the current epoch's (N). When the target proposer is found, it returns `{ targetEpoch }` while staying at epoch N -- one full epoch before the target. 2. **Start + warp phase** (in the test): After the function returns, sequencers are started while still one epoch before the target. Only then does the test warp to the target epoch via `advanceToEpoch(targetEpoch)`. This eliminates the race because sequencers are already running when the target epoch begins. The function can safely query future epoch slots because `epochCache.getProposerAttesterAddressInSlot` works for any slot within the `lagInEpochsForValidatorSet` window (typically 2 epochs ahead), and we only look 1 epoch ahead. ## Changes - **`shared.ts`**: Renamed `awaitEpochWithProposer` -> `advanceToEpochBeforeProposer`. Now checks next epoch's slots and returns `{ targetEpoch: EpochNumber }` instead of `void`. - **`duplicate_attestation_slash.test.ts`**: Updated to start sequencers before warping to target epoch. - **`duplicate_proposal_slash.test.ts`**: Same change. Also filtered offense assertions to only check `DUPLICATE_PROPOSAL` offenses, since the two malicious nodes sharing the same key each self-attest to their own (different) checkpoint proposals, causing honest nodes to also detect a `DUPLICATE_ATTESTATION` offense. Fixes A-632

Fix flakiness in duplicate_proposal_slash and duplicate_attestation_slash e2e tests. Both tests flake at a 3-13% rate because the malicious proposer (1 of 4 validators) is never selected within the timeout window. Three root causes are addressed: 1. **GossipSub mesh checked only for TX topic** — The tests need block proposals and checkpoint proposals to propagate, but `waitForP2PMeshConnectivity` only verified the `tx` topic mesh. Added a `topics` parameter so callers can specify which topics to wait on. Both slash tests now wait on `tx`, `block_proposal`, and `checkpoint_proposal`. 2. **Proposer selection is probabilistic** — With 4 validators and 2 slots/epoch, the malicious proposer has ~25% chance per slot. Added `awaitEpochWithProposer` helper that advances epochs (via L1 time warp) until the target proposer is deterministically selected for at least one slot in the current epoch. 3. **Race between node startup and first proposal** — Nodes started sequencing immediately upon creation, potentially proposing before the P2P mesh was ready. Now all nodes are created with `dontStartSequencer: true`, and sequencers are started simultaneously only after mesh formation, committee existence, and epoch advancement are confirmed. Fixes A-593 Fixes A-595

## Summary Fixes a race condition in the `duplicate_attestation_slash` and `duplicate_proposal_slash` e2e tests that caused intermittent timeouts waiting for slashing offenses to be detected. ## Root Cause Investigated CI failure [2abee794200ae6a7](http://ci.aztec-labs.com/2abee794200ae6a7) (commit `f71c235a670d`, merge queue for PR #20555). This run **did include** the fix from #20990, yet the `duplicate_attestation_slash` test still failed with a timeout after ~478 seconds. The failure sequence from the logs: 1. `awaitEpochWithProposer` found the malicious proposer at **slot 14** (epoch 7) 2. The function returned, having warped L1 time to the start of epoch 7 3. Sequencers were then started (`await Promise.all(nodes.map(n => n.getSequencer()!.start()))`) 4. By the time sequencers were ready to build blocks, L1 time had advanced past epoch 7 into **epoch 8** (first block built at slot 16) 5. The malicious proposer was never selected in epoch 8+, so no duplicate proposals/attestations were created 6. The slasher had nothing to detect, and the test timed out The core issue: `awaitEpochWithProposer` warped to the target epoch and returned, but starting sequencers takes real time. With only 2 slots per epoch (48s total), the epoch passed before sequencers could act. ## Fix Renamed `awaitEpochWithProposer` to `advanceToEpochBeforeProposer` and changed the approach to a two-phase pattern: 1. **Find phase**: The function now checks the **next** epoch's slots (N+1) instead of the current epoch's (N). When the target proposer is found, it returns `{ targetEpoch }` while staying at epoch N -- one full epoch before the target. 2. **Start + warp phase** (in the test): After the function returns, sequencers are started while still one epoch before the target. Only then does the test warp to the target epoch via `advanceToEpoch(targetEpoch)`. This eliminates the race because sequencers are already running when the target epoch begins. The function can safely query future epoch slots because `epochCache.getProposerAttesterAddressInSlot` works for any slot within the `lagInEpochsForValidatorSet` window (typically 2 epochs ahead), and we only look 1 epoch ahead. ## Changes - **`shared.ts`**: Renamed `awaitEpochWithProposer` -> `advanceToEpochBeforeProposer`. Now checks next epoch's slots and returns `{ targetEpoch: EpochNumber }` instead of `void`. - **`duplicate_attestation_slash.test.ts`**: Updated to start sequencers before warping to target epoch. - **`duplicate_proposal_slash.test.ts`**: Same change. Also filtered offense assertions to only check `DUPLICATE_PROPOSAL` offenses, since the two malicious nodes sharing the same key each self-attest to their own (different) checkpoint proposals, causing honest nodes to also detect a `DUPLICATE_ATTESTATION` offense. Fixes A-632

Fix flakiness in duplicate_proposal_slash and duplicate_attestation_slash e2e tests. Both tests flake at a 3-13% rate because the malicious proposer (1 of 4 validators) is never selected within the timeout window. Three root causes are addressed: 1. **GossipSub mesh checked only for TX topic** — The tests need block proposals and checkpoint proposals to propagate, but `waitForP2PMeshConnectivity` only verified the `tx` topic mesh. Added a `topics` parameter so callers can specify which topics to wait on. Both slash tests now wait on `tx`, `block_proposal`, and `checkpoint_proposal`. 2. **Proposer selection is probabilistic** — With 4 validators and 2 slots/epoch, the malicious proposer has ~25% chance per slot. Added `awaitEpochWithProposer` helper that advances epochs (via L1 time warp) until the target proposer is deterministically selected for at least one slot in the current epoch. 3. **Race between node startup and first proposal** — Nodes started sequencing immediately upon creation, potentially proposing before the P2P mesh was ready. Now all nodes are created with `dontStartSequencer: true`, and sequencers are started simultaneously only after mesh formation, committee existence, and epoch advancement are confirmed. Fixes A-593 Fixes A-595

## Summary Ports the P2P mesh connectivity fix from `next` to `v4-next` for the `duplicate_attestation_slash` and `duplicate_proposal_slash` e2e tests. Cherry-picked commit: `8680abcca7` — chore: deflake duplicate proposals and attestations (#20990) ## Root Cause `waitForP2PMeshConnectivity` only waited for the `tx` GossipSub topic mesh to form. The slash tests also need `block_proposal` and `checkpoint_proposal` meshes ready before sequencers start proposing, otherwise proposals get dropped and offenses are never detected. ## Fix - Added `topics` parameter to `waitForP2PMeshConnectivity` (defaults to `[TopicType.tx]` for backward compat) - Slash tests now wait for all 3 relevant topics before proceeding - Also added `advanceToEpochBeforeProposer` helper so sequencers start before the target epoch arrives Co-authored-by: Michal Rzeszutko <michal.rzeszutko@gmail.com>

chore: deflake duplicate proposals and attestations

654e0c1

spalladino reviewed Mar 2, 2026

View reviewed changes

spalladino approved these changes Mar 2, 2026

View reviewed changes

spalladino merged commit 8680abc into merge-train/spartan Mar 2, 2026
10 checks passed

spalladino deleted the mr/deflake-duplicate-proposals-and-attestations branch March 2, 2026 14:28

AztecBot mentioned this pull request Mar 2, 2026

feat: merge-train/spartan #20899

Merged

mrzeszutko mentioned this pull request Mar 10, 2026

chore: deflake duplicate attestations and proposals slash tests #21294

Merged

AztecBot mentioned this pull request Mar 19, 2026

chore: port P2P mesh topic deflake fix to v4-next #21825

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: deflake duplicate proposals and attestations#20990

chore: deflake duplicate proposals and attestations#20990
spalladino merged 1 commit into
merge-train/spartanfrom
mr/deflake-duplicate-proposals-and-attestations

mrzeszutko commented Mar 2, 2026

Uh oh!

spalladino left a comment

Uh oh!

spalladino Mar 2, 2026

Uh oh!

mrzeszutko commented Mar 2, 2026

Uh oh!

spalladino commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mrzeszutko commented Mar 2, 2026

Uh oh!

spalladino left a comment

Choose a reason for hiding this comment

Uh oh!

spalladino Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

mrzeszutko commented Mar 2, 2026

Uh oh!

spalladino commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants