test: flag e2e_ha_full as flake under HA pipelining#23541
Merged
Conversation
http://ci.aztec-labs.com/136431da99834194 dequeued PR #23344 again with the HA full suite failing under proposer pipelining. The visible failure signals in the dashboard log are recurring "Timed out waiting for block with archive matching checkpoint proposal" warnings across slots and an "Error building checkpoint at slot 127: already proposed block for slot 127 index 0" on HA-4, so multiple peers race on the same proposal under pipelining. The existing per-assertion / afterAll-hook flake entries for this suite did not match because jest's per-test banner is not reached within the dashboard log capture. Add a broad regex-only entry to flag any failure of yarn-project/end-to-end/scripts/run_test.sh ha src/composed/ha/e2e_ha_full.test.ts as a flake until the HA pipelining work stabilises.
PhilWindle
approved these changes
May 24, 2026
This was referenced May 24, 2026
PhilWindle
pushed a commit
that referenced
this pull request
May 24, 2026
## Summary PR #23344 was dequeued again at https://github.com/AztecProtocol/aztec-packages/actions/runs/26364977301. The flake-flag added in #23541 wasn't enough — ci3 retries once and the HA full suite fails both attempts. Multiple distinct failure modes have been hit recently: - `✕ should coordinate governance voting across HA nodes` - `✕ should distribute work across multiple HA nodes` - afterAll teardown hangs → `Exceeded timeout of [0-9]+ ms for a hook` - 5-peer races on checkpoint proposals (`Timed out waiting for block with archive matching checkpoint proposal`, `already proposed block for slot N index 0`) This change collapses the four overlapping `e2e_ha_full` entries in `.test_patterns.yml` into a single `skip: true` entry so the suite is outright skipped on `merge-train/spartan` until HA pipelining stabilises. ## Notes - Targets `merge-train/spartan` per yarn-project default base. - Adjacent `src/e2e_slashing/attested_invalid_proposal.test.ts` entry left untouched. - Owner kept as @spyros (current owner of the suite-level entry). Requested in Slack by @PhilWindle. --- *Created by [claudebox](https://claudebox.work/v2/sessions/a745321353cb58ff) · group: `slackbot`*
danielntmd
pushed a commit
to danielntmd/aztec-packages
that referenced
this pull request
Jun 4, 2026
BEGIN_COMMIT_OVERRIDE refactor(p2p): merge FastTxCollection into TxCollection with sequential pipeline (AztecProtocol#23245) refactor(publisher): bundle-level simulate; drop per-action enqueue sims (AztecProtocol#23165) refactor(stdlib): remove deprecated RevertCode/TxExecutionResult aliases (AztecProtocol#23249) test(e2e): fix race in 'proposer invalidates multiple checkpoints' (AztecProtocol#23259) fix: clean up old jobs regardless of pending status (AztecProtocol#23260) refactor(p2p): remove unused sendBatchRequest (AztecProtocol#23273) chore(p2p): remove proposal_tx_collector leftovers (AztecProtocol#23276) feat: slash truncated checkpoint proposals (AztecProtocol#23250) refactor: remove unused map in attestation pool (AztecProtocol#23284) chore(p2p): assert last block in checkpoint proposal is correct (AztecProtocol#23274) refactor(l1-tx-utils): use DateProvider for fail-fast timeout check (AztecProtocol#23257) feat(sandbox): support proposer pipelining in local network (AztecProtocol#23277) test(e2e): fix race in broadcasted_invalid_block_proposal_slash under pipelining (AztecProtocol#23302) fix(archiver): atomic getter for L2 tips (AztecProtocol#23295) fix(sequencer): use targetSlot in tryVoteWhenEscapeHatchOpen under pipelining (AztecProtocol#23296) fix(world-state): make fork close idempotent for pruned forks (AztecProtocol#23298) test(e2e): migrate passing tests to proposer pipelining (AztecProtocol#23275) chore: update dashboard (AztecProtocol#23312) chore: Revert "feat(sandbox): support proposer pipelining in local network" (AztecProtocol#23313) test: slash on bad attestation (AztecProtocol#23184) feat(slasher): per-slot data-withholding watcher (A-523, A-525) (AztecProtocol#23116) test(e2e): enable pipelining on e2e debug trace (AztecProtocol#23301) test(e2e): enable pipelining on l1-to-l2 test (AztecProtocol#23300) test(e2e): switch fee_settings to organic fee bumps under pipelining (AztecProtocol#23303) fix(ci): retry sqlite3mc-wasm download on transient DNS/TLS failures (AztecProtocol#23333) test(e2e): wait for real oracle rotation in fee_settings inflate helper (AztecProtocol#23334) test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining (AztecProtocol#23336) fix(spartan-bench): tolerate older node images in SlasherConfig schema (AztecProtocol#23351) fix: interrupt prover jobs in stop (AztecProtocol#23358) test(e2e): enable pipelining on bot, fees, and avm simulator tests (AztecProtocol#23329) feat(sentinel): end-of-epoch evaluation with re-execution outcomes (AztecProtocol#23286) feat: slash for invalid checkpoint proposals (AztecProtocol#23270) fix: fork closure in epoch proving jobs (AztecProtocol#23390) fix(slasher): anchor watcher scans at archiver synced L2 slot (AztecProtocol#23394) fix: avoid npm uplink for aztec-up local publishes (AztecProtocol#23396) test(e2e): ignore benign 'Insufficient valid txs' block-build-failed in epochs tests (AztecProtocol#23424) chore: refactor weekly proving test wait (AztecProtocol#23395) refactor: add fifo set (AztecProtocol#23271) feat(sandbox): support proposer pipelining in local network (AztecProtocol#23327) fix(p2p): validate BLOCK_TXS in BatchTxRequester (AztecProtocol#23371) chore(p2p): simplify IBatchRequestTxValidator (AztecProtocol#23373) feat(sequencer): AutomineSequencer for single-sequencer e2e tests (AztecProtocol#23354) fix(prover): wait for previous epoch to be proven (AztecProtocol#23458) chore: collocate provers (AztecProtocol#23439) chore: rm staging-ignition (AztecProtocol#23440) chore: rm unused networks (AztecProtocol#23441) test(e2e): migrate block_building, multi_validator_node, publisher_funding, invalid_checkpoint_proposal to pipelining (AztecProtocol#23414) fix(archiver): reconcile local blocks with L1 checkpoints by block number (AztecProtocol#23461) feat: Updated slash conditions on block proposals (AztecProtocol#23466) test(e2e): migrate HA full test to pipelining (AztecProtocol#23463) chore: update resource profiles (AztecProtocol#23442) chore: update debug log levels (AztecProtocol#23456) test: fix flaky sentinel_status_slash by asserting the fault on the checkpoint slot (AztecProtocol#23483) feat(slasher): slash checkpoint equivocation between P2P and L1 (A-980) (AztecProtocol#23436) refactor(slasher): rename ATTESTED_DESCENDANT_OF_INVALID -> PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS (AztecProtocol#23468) fix: reject block proposals in poisoned slots (AztecProtocol#23411) fix: retry nargo dep + solc downloads to survive transient DNS drops (AztecProtocol#23490) fix: enrich json-rpc tracing (AztecProtocol#23412) feat: add trace export controls (AztecProtocol#23413) test(e2e): assert no equivocation offenses in HA full test (AztecProtocol#23496) test: cover invalid checkpoint proposal slashing (AztecProtocol#23503) test(e2e): migrate more e2e suites to proposer pipelining (AztecProtocol#23482) test: flag e2e_slashing_attested_invalid_proposal as flake under pipelining (AztecProtocol#23501) test: flag e2e_p2p_duplicate_proposal_slash as flake under pipelining (AztecProtocol#23515) test(e2e): require cross-observer agreement on sentinel fault slot (AztecProtocol#23513) test: flag e2e_ha_full afterAll hook timeout as flake under pipelining (AztecProtocol#23524) fix(e2e): propagate l1ContractsArgs into node config so archiver matches L1 (AztecProtocol#23514) test: flag e2e_multi_validator_node_key_store P2P tx-dropped failure as flake (AztecProtocol#23528) test(cheat-codes): retry warpL2TimeAtLeastTo in-current-slot test on L1 race (AztecProtocol#23533) test(e2e_ha_full): parallel HA peer node teardown with per-node deadline (AztecProtocol#23539) test: flag e2e_ha_full as flake under HA pipelining (AztecProtocol#23541) test(ci): skip e2e_ha_full entirely on merge-train/spartan (AztecProtocol#23542) test(ci): skip e2e_multi_validator_node_key_store entirely on merge-train/spartan (AztecProtocol#23544) END_COMMIT_OVERRIDE
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dequeued from merge-train/spartan again: http://ci.aztec-labs.com/136431da99834194.
The HA full suite keeps failing under proposer pipelining with shifting symptoms. In this run the dashboard log shows recurring
validator:proposal-handler Timed out waiting for block with archive matching checkpoint proposalwarnings (slot 98, 115, …) and anError building checkpoint at slot 127: already proposed block for slot 127 index 0on HA-4 — i.e. the 5 HA peers race on the same proposal. The bundled #23539 (parallel peer teardown) and #23524 (afterAll hook timeout) entries did not catch this run because jest's per-test summary was not reached within the dashboard log capture.This PR adds a broad regex-only entry under
.test_patterns.ymlto flag any failure ofyarn-project/end-to-end/scripts/run_test.sh ha src/composed/ha/e2e_ha_full.test.tsas a flake. Owner: @PaLLa, matching the existing pipelining-flavoured entries for this suite.The intent is to unblock the merge queue while the HA pipelining stabilisation work continues; narrow the regex (or add a real fix) once the failure modes settle down.
Created by claudebox · group:
slackbot