Skip to content

test: flag e2e_slashing_attested_invalid_proposal as flake under pipelining#23501

Merged
spalladino merged 2 commits into
merge-train/spartanfrom
spl/flake-e2e-slashing-attested-invalid-proposal
May 22, 2026
Merged

test: flag e2e_slashing_attested_invalid_proposal as flake under pipelining#23501
spalladino merged 2 commits into
merge-train/spartanfrom
spl/flake-e2e-slashing-attested-invalid-proposal

Conversation

@spalladino

Copy link
Copy Markdown
Contributor

Summary

Add .test_patterns.yml entry to flag e2e_slashing_attested_invalid_proposal > slashes a lazy attester for an invalid checkpoint and clears it on delayed equivocation as a flake. Hit on http://ci.aztec-labs.com/6db6103599cb22e6.

This is a real race under pipelining, not a one-off — same dynamics as the e2e_ha_full fix in spl/fix-ha-full-equivocation-check. Source-level fix is the proper resolution; flake flag is the quick unblock.

Investigation

Failure: TimeoutError: Timeout awaiting honest validator slash offenses for invalid proposal attestation at attested_invalid_proposal.test.ts:404. The honest validator never records an ATTESTED_TO_INVALID_CHECKPOINT_PROPOSAL offense against the lazy validator, so the retry-until times out.

Timeline from the CI log:

t event
11:38:45.871 validator-1 (bad proposer) broadcasts checkpoint proposal for slot 13
11:38:45.993 bad proposer broadcasts its self-attestation
11:38:46.019 validator-2 (lazy) receives self-attestation (24 ms)
11:38:46.021 validator-2 receives a block proposal — but parent block not found
11:39:09.804 validator-2 finally receives the checkpoint proposal — 23 s after broadcast
11:39:09.808 validator-2 attests (lazy node has skipCheckpointProposalValidation: true, so it signs blindly)
11:39:09.828 validator-1 / validator-3 reject the attestation: Checkpoint attestation slot 13 is not current (14) or next (14) slot (HighToleranceError), peer scored down

Root cause:

  1. The p2p CheckpointAttestationValidator (yarn-project/p2p/src/msg_validators/attestation_validator/attestation_validator.ts:56-69) accepts attestations only for targetSlot, nextSlot, or within attestationWindowIntoTargetSlot (~1.5 s) of the current wallclock slot. After that it returns reject with HighToleranceError.
  2. The slasher's ATTESTED_TO_INVALID_CHECKPOINT_PROPOSAL detector (yarn-project/validator-client/src/validator.ts:815 handleCheckpointAttestation) is wired only to the gossip path via registerCheckpointAttestationCallback (validator.ts:428). Gossip-rejected attestations never reach it.
  3. The lazy validator's attestation was 23 s late because validator-2 received the self-attestation from the bad proposer in 24 ms but didn't see the checkpoint proposal until 11:39:09. Most likely an interaction between block-3 arriving before block-2's processing finishes (parent_block_not_found warnings precede the proposal arrival), with the checkpoint dispatch held behind block validation in the libp2p service.

This is exactly the dynamic described by the commit message on spl/fix-ha-full-equivocation-check (#035ac6a8) — the p2p attestation pool is not a durable record of past-slot attestations under pipelining. The attested_invalid_proposal test sits one step downstream: instead of reading the pool, it relies on the slasher recording the gossip-validated attestation. Same race, same outcome.

Follow-up

Proper source fix: give the slasher's invalid-checkpoint-attestation detector a path that survives p2p gossip rejection — e.g. observe checkpoint attestations matching a slot already flagged for an invalid block proposal before the validator's reject return, scoped to that case. Equivalent in spirit to the HA test rewrite, but on the producing side.

Test plan

  • Flake regex matches the observed TimeoutError: Timeout awaiting honest validator slash offenses for invalid proposal attestation from the failure
  • CI runs e2e_slashing/attested_invalid_proposal per existing pattern; failures now alert palla on #aztec3-ci instead of failing the build

spalladino and others added 2 commits May 22, 2026 13:47
…lining

Slasher's ATTESTED_TO_INVALID_CHECKPOINT_PROPOSAL detection only fires from
the p2p checkpoint-attestation gossip callback. Under pipelining, late
attestations get rejected by the validator before the callback fires, so the
slasher never records the offense and the test times out.

Same race as the one fixed for e2e_ha_full in #035ac6a8 -- worth a proper
source-level fix later (give the slasher a path that survives gossip
rejection), but flagging as a flake for now.

Hit on http://ci.aztec-labs.com/6db6103599cb22e6.
@spalladino spalladino merged commit 522f6e8 into merge-train/spartan May 22, 2026
7 of 9 checks passed
@spalladino spalladino deleted the spl/flake-e2e-slashing-attested-invalid-proposal branch May 22, 2026 15:17
danielntmd pushed a commit to danielntmd/aztec-packages that referenced this pull request Jun 4, 2026
BEGIN_COMMIT_OVERRIDE
refactor(p2p): merge FastTxCollection into TxCollection with sequential
pipeline (AztecProtocol#23245)
refactor(publisher): bundle-level simulate; drop per-action enqueue sims
(AztecProtocol#23165)
refactor(stdlib): remove deprecated RevertCode/TxExecutionResult aliases
(AztecProtocol#23249)
test(e2e): fix race in 'proposer invalidates multiple checkpoints'
(AztecProtocol#23259)
fix: clean up old jobs regardless of pending status (AztecProtocol#23260)
refactor(p2p): remove unused sendBatchRequest (AztecProtocol#23273)
chore(p2p): remove proposal_tx_collector leftovers (AztecProtocol#23276)
feat: slash truncated checkpoint proposals (AztecProtocol#23250)
refactor: remove unused map in attestation pool (AztecProtocol#23284)
chore(p2p): assert last block in checkpoint proposal is correct (AztecProtocol#23274)
refactor(l1-tx-utils): use DateProvider for fail-fast timeout check
(AztecProtocol#23257)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23277)
test(e2e): fix race in broadcasted_invalid_block_proposal_slash under
pipelining (AztecProtocol#23302)
fix(archiver): atomic getter for L2 tips (AztecProtocol#23295)
fix(sequencer): use targetSlot in tryVoteWhenEscapeHatchOpen under
pipelining (AztecProtocol#23296)
fix(world-state): make fork close idempotent for pruned forks (AztecProtocol#23298)
test(e2e): migrate passing tests to proposer pipelining (AztecProtocol#23275)
chore: update dashboard (AztecProtocol#23312)
chore: Revert "feat(sandbox): support proposer pipelining in local
network" (AztecProtocol#23313)
test: slash on bad attestation (AztecProtocol#23184)
feat(slasher): per-slot data-withholding watcher (A-523, A-525) (AztecProtocol#23116)
test(e2e): enable pipelining on e2e debug trace (AztecProtocol#23301)
test(e2e): enable pipelining on l1-to-l2 test (AztecProtocol#23300)
test(e2e): switch fee_settings to organic fee bumps under pipelining
(AztecProtocol#23303)
fix(ci): retry sqlite3mc-wasm download on transient DNS/TLS failures
(AztecProtocol#23333)
test(e2e): wait for real oracle rotation in fee_settings inflate helper
(AztecProtocol#23334)
test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining
(AztecProtocol#23336)
fix(spartan-bench): tolerate older node images in SlasherConfig schema
(AztecProtocol#23351)
fix: interrupt prover jobs in stop (AztecProtocol#23358)
test(e2e): enable pipelining on bot, fees, and avm simulator tests
(AztecProtocol#23329)
feat(sentinel): end-of-epoch evaluation with re-execution outcomes
(AztecProtocol#23286)
feat: slash for invalid checkpoint proposals (AztecProtocol#23270)
fix: fork closure in epoch proving jobs (AztecProtocol#23390)
fix(slasher): anchor watcher scans at archiver synced L2 slot (AztecProtocol#23394)
fix: avoid npm uplink for aztec-up local publishes (AztecProtocol#23396)
test(e2e): ignore benign 'Insufficient valid txs' block-build-failed in
epochs tests (AztecProtocol#23424)
chore: refactor weekly proving test wait (AztecProtocol#23395)
refactor: add fifo set (AztecProtocol#23271)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23327)
fix(p2p): validate BLOCK_TXS in BatchTxRequester (AztecProtocol#23371)
chore(p2p): simplify IBatchRequestTxValidator (AztecProtocol#23373)
feat(sequencer): AutomineSequencer for single-sequencer e2e tests
(AztecProtocol#23354)
fix(prover): wait for previous epoch to be proven (AztecProtocol#23458)
chore: collocate provers (AztecProtocol#23439)
chore: rm staging-ignition (AztecProtocol#23440)
chore: rm unused networks (AztecProtocol#23441)
test(e2e): migrate block_building, multi_validator_node,
publisher_funding, invalid_checkpoint_proposal to pipelining (AztecProtocol#23414)
fix(archiver): reconcile local blocks with L1 checkpoints by block
number (AztecProtocol#23461)
feat: Updated slash conditions on block proposals (AztecProtocol#23466)
test(e2e): migrate HA full test to pipelining (AztecProtocol#23463)
chore: update resource profiles (AztecProtocol#23442)
chore: update debug log levels (AztecProtocol#23456)
test: fix flaky sentinel_status_slash by asserting the fault on the
checkpoint slot (AztecProtocol#23483)
feat(slasher): slash checkpoint equivocation between P2P and L1 (A-980)
(AztecProtocol#23436)
refactor(slasher): rename ATTESTED_DESCENDANT_OF_INVALID ->
PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS (AztecProtocol#23468)
fix: reject block proposals in poisoned slots (AztecProtocol#23411)
fix: retry nargo dep + solc downloads to survive transient DNS drops
(AztecProtocol#23490)
fix: enrich json-rpc tracing (AztecProtocol#23412)
feat: add trace export controls (AztecProtocol#23413)
test(e2e): assert no equivocation offenses in HA full test (AztecProtocol#23496)
test: cover invalid checkpoint proposal slashing (AztecProtocol#23503)
test(e2e): migrate more e2e suites to proposer pipelining (AztecProtocol#23482)
test: flag e2e_slashing_attested_invalid_proposal as flake under
pipelining (AztecProtocol#23501)
test: flag e2e_p2p_duplicate_proposal_slash as flake under pipelining
(AztecProtocol#23515)
test(e2e): require cross-observer agreement on sentinel fault slot
(AztecProtocol#23513)
test: flag e2e_ha_full afterAll hook timeout as flake under pipelining
(AztecProtocol#23524)
fix(e2e): propagate l1ContractsArgs into node config so archiver matches
L1 (AztecProtocol#23514)
test: flag e2e_multi_validator_node_key_store P2P tx-dropped failure as
flake (AztecProtocol#23528)
test(cheat-codes): retry warpL2TimeAtLeastTo in-current-slot test on L1
race (AztecProtocol#23533)
test(e2e_ha_full): parallel HA peer node teardown with per-node deadline
(AztecProtocol#23539)
test: flag e2e_ha_full as flake under HA pipelining (AztecProtocol#23541)
test(ci): skip e2e_ha_full entirely on merge-train/spartan (AztecProtocol#23542)
test(ci): skip e2e_multi_validator_node_key_store entirely on
merge-train/spartan (AztecProtocol#23544)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant