test: mark tx_stats_bench 10 TPS as flake-retryable on merge-train/spartan#23083
Merged
spalladino merged 1 commit intoMay 8, 2026
Merged
Conversation
spalladino
approved these changes
May 8, 2026
This was referenced May 8, 2026
ledwards2225
pushed a commit
that referenced
this pull request
May 11, 2026
Cherry-pick of #23092 onto merge-train/fairies. Same flake (`verifies transactions at 10 TPS` in `tx_stats_bench.test.ts:268`) is now blocking PRs based on this train; see #23092 for the full analysis (bb.js NativeUnixSocket backend under 8x parallel IVC verifications intermittently returning `valid:false`). Skipping at sub-test granularity keeps the other three serial sub-tests emitting their compression and single-tx verification metrics; only the IVC-verifier-under-concurrency metrics are dropped. The `.test_patterns.yml` flake-retry entry from #23083 stays in place.
rangozd
pushed a commit
to rangozd/aztec-packages
that referenced
this pull request
May 16, 2026
BEGIN_COMMIT_OVERRIDE fix(test): warp L1 forward when proposer scan hits EpochNotStable (AztecProtocol#22967) test(e2e): fail epochs tests on proposer-rollup-check-failed (AztecProtocol#22965) fix: grafana switch to aztec_status="proposed" (AztecProtocol#22978) chore: update benchmark scraper (AztecProtocol#22984) test(e2e): migrate simple epoch tests to pipelining (AztecProtocol#22973) chore: remove top-level yarn.lock (AztecProtocol#22987) refactor(archiver)!: unify L2BlockSource checkpoint lookups via query objects (AztecProtocol#22933) fix(sequencer): bounded sweep instead of event scan for governance proposal check (AztecProtocol#22989) fix(docs): allow webapp-tutorial yarn install to populate empty lockfile in CI (AztecProtocol#23000) test(e2e): enable pipelining in l1-reorgs and mbps redistribution tests (AztecProtocol#23009) fix(archiver): restore pending block height metric under pipelining (AztecProtocol#22994) chore(p2p): remove skipped validation result option (AztecProtocol#23034) refactor(p2p)!: remove slow tx collection flow (AztecProtocol#22878) chore(spartan): add next-net-clone environment config (AztecProtocol#22995) chore(sequencer): add context to proposer-rollup-check-failed logs (AztecProtocol#23071) test(e2e): wait for archiver sync before asserting pipelining (AztecProtocol#22997) refactor(node-rpc)!: remove deprecated AztecNode methods and L2BlockSource tip helpers (AztecProtocol#22934) feat(p2p): detect and track announce IP changes at runtime (AztecProtocol#22405) test: mark tx_stats_bench 10 TPS as flake-retryable on merge-train/spartan (AztecProtocol#23083) fix(sequencer): bind vote-only multicalls to target slot under pipelining (AztecProtocol#23090) feat(sequencer): build optimistically across pruning epoch boundary (AztecProtocol#23056) fix(sequencer): use chainTipsOverride.pending for log context (AztecProtocol#23098) test(e2e): relax post-boundary slot assertion in epochs_proof_at_boundary (AztecProtocol#23108) fix(bb-prover): pool long-lived bb verifier processes instead of spawning per-call (AztecProtocol#23093) fix(sequencer): anchor fee asset price modifier to predicted parent (AztecProtocol#23113) chore: error log when L1 head timestamp drifts (AztecProtocol#22947) fix(sequencer): override full parent checkpoint cell in pipelined simulation (AztecProtocol#23073) test(e2e): enable pipelining on missed l1 slot test (AztecProtocol#23068) fix: more robust metrics reporting in IRM monitor (AztecProtocol#23038) fix: preserve LMDB slashing protection (AztecProtocol#23145) test(e2e): enable pipelining on p2p tests (AztecProtocol#23070) fix(archiver): move L2 tips cache refresh out of write transactions (AztecProtocol#23110) test(e2e): fix data_withholding_slash flake by freezing L1 across restart (AztecProtocol#23162) fix(validator): include proposed checkpoint out-hashes when validating checkpoint proposals (AztecProtocol#23119) refactor(config): drop nested config option, flatten l1Contracts (AztecProtocol#23143) test(e2e): bump bash TIMEOUT for e2e_p2p/add_rollup to match jest 20m (AztecProtocol#23177) fix(p2p): chunk archive of mined txs on block finalization (A-969) (AztecProtocol#23085) fix(p2p): stream tx pool hydration to bound startup memory (A-968) (AztecProtocol#23086) chore: remove orphan --archiver flag usages from start invocations (AztecProtocol#23186) feat(ci): daily merge-train/spartan stale-PR notifier (AztecProtocol#23189) fix: preserve contract artifact permissions (AztecProtocol#23174) fix(ci3): accept slashes in /list/<path:key> for merge-train history (AztecProtocol#23160) feat(ci): route merge-train/spartan flake notifications to #team-alpha-ci (AztecProtocol#23219) fix(cheat-codes): wait for post-warp L2 block in warpL2TimeAtLeastTo (AztecProtocol#23213) feat: slash attesters signing over bad checkpoints (AztecProtocol#23180) refactor(prover-client): split orchestrator into sub-tree + top-tree pair (AztecProtocol#22996) fix(srs): retry transient CRS HTTP downloads with exponential backoff (AztecProtocol#23244) refactor(p2p): remove old reqresp mode (AztecProtocol#23158) docs(sequencer-client): rewrite top-level and timing READMEs (AztecProtocol#23149) fix(aztec-node): include upcoming checkpoint's L1 to L2 messages in simulatePublicCalls (AztecProtocol#23163) END_COMMIT_OVERRIDE
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The
verifies transactions at 10 TPSsub-test ofyarn-project/end-to-end/src/bench/tx_stats_bench.test.tsis now reliably flaking on thebench allstep ofmerge-train/spartan. It has fired on at least two different merge-train commits hours apart, with no relation to either commit's diff:Both runs hit the same assertion:
Sub-test failing log on the latest run: http://ci.aztec-labs.com/ca459ca73d02002c (
bench allparent: http://ci.aztec-labs.com/90616bad7bf7ebaa).The other three sub-tests in the suite (compression; single private verify x20 serial; single public verify x20 serial) pass cleanly against the same proven txs in both runs. The failure is in the stress sub-test that fires 600 IVC verifications at 10/s with 8 concurrent IVC verifiers (
BB_NUM_IVC_VERIFIERS=8,BB_IVC_CONCURRENCY=1). At least one verification returnsvalid: falseunder load.Cause
Neither triggering commit touches the IVC verifier path:
The two failures sharing this signature across unrelated diffs is strong evidence that the flake is independent of the merge-train commit and stems from the bench infrastructure itself.
The likely culprit is the recent bb-prover migration to the bb.js
NativeUnixSocketbackend (#21564), which spawns a fresh bb subprocess per Chonk verification viawithVerifierInstance. Under 8x parallel verifications on the CPU-isolated bench host (each verifier requesting 16 threads, 8 × 16 = 128 threads on 56 isolated cores), transient verifier failures appear. The bench-output log shows continuousbb.js - Received signal 15, shutting down gracefully...traffic during the 10 TPS phase — verifier instances are being torn down rapidly, and at least one verification slips through with a stale/incomplete response. Because the serial sub-tests (numIterations = 20sequential) pass cleanly in both runs, this is a stress-only interaction, not a correctness regression.Approach
Add
tx_stats_benchto.test_patterns.ymlwith anerror_regexanchored to the test file's stack-trace line (tx_stats_bench.test.ts:<line>:<col>), and assign*charlieas owner (author of the bb.js migration). With this entry,ci3/run_test_cmdretries the test once on failure and treats a single retry-pass as a flake instead of a hard fail, unblocking the merge train for unrelated commits while Charlie investigates the underlying concurrency interaction with the bb.js backend.The
error_regexis intentionally narrow (file + line + column from the stack trace) so other ways tx_stats_bench could fail (timeout, OOM, infra) are still surfaced as hard fails.Changes
.test_patterns.yml: add atx_stats_benchentry with an error_regex anchored to the test file's stack-trace line and*charlieas owner.ClaudeBox logs: