chore(ci): flag ProvingBroker "does not retry if job is stale" as flake#23047
Merged
Conversation
alexghr
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Flagging
ProvingBroker > Retries > does not retry if job is staleas a flake in.test_patterns.yml. Failure surfaced on an unrelated wallet PR —dbanks12's wallet refactor — at http://ci.aztec-labs.com/64a972aafaa40dd0.Failure
The broker tries to commit the final
reportProvingJobErrorwrite after the per-epoch LMDB store has already been closed (the test advances the epoch from 1 → 3, which causes the epoch-1 store to be torn down). The race is between the epoch advance / cleanup path and the final error write — a timing flake, not a logic bug.Owner
Test was authored by
@alexghrin #9400 (feat: new proving broker implementation) and most recently edited by@alexghrin #22508 (fix(prover-client): don't mark in-progress epoch N jobs as stale when epoch N+1 starts).@spypsyhas also recently fixed retries-related races in this file (#21842, #22355). Pinging Alex as primary owner; tag Spyros if it's actually a retry-counter race rather than a store-lifecycle race.Other branches
Spot-checked the most recent failed runs on
merge-train/fairiesandmerge-train/spartan— none of them hit this sameproving_broker/Store is closedfailure in the data window I sampled. The flake has only been observed on the one wallet PR run linked above so far.Pattern entry
The new entry uses both
regex(test file path) anderror_regex(does not retry if job is stale|Store is closed) so unrelated failures inproving_broker.test.tsstill fail CI — only this specific timing race gets quarantined to a Slack ping.Created by claudebox · group:
aztec