Skip to content

chore(ci): flag ProvingBroker "does not retry if job is stale" as flake#23047

Merged
dbanks12 merged 1 commit into
nextfrom
cb/b4b6eb63ff78
May 7, 2026
Merged

chore(ci): flag ProvingBroker "does not retry if job is stale" as flake#23047
dbanks12 merged 1 commit into
nextfrom
cb/b4b6eb63ff78

Conversation

@AztecBot

@AztecBot AztecBot commented May 7, 2026

Copy link
Copy Markdown
Collaborator

Flagging ProvingBroker > Retries > does not retry if job is stale as a flake in .test_patterns.yml. Failure surfaced on an unrelated wallet PR — dbanks12's wallet refactor — at http://ci.aztec-labs.com/64a972aafaa40dd0.

Failure

● ProvingBroker › Retries › does not retry if job is stale

  Store is closed

  > 99 |             throw new Error('Store is closed');
        |                   ^

  at AztecLMDBStoreV2.transactionAsync (yarn-project/kv-store/dest/lmdb-v2/store.js:99:19)
  at SingleEpochDatabase.transactionAsync [as batchWrite]
    (yarn-project/prover-client/src/proving_broker/proving_broker_database/persisted.ts:45:22)
  at KVBrokerDatabase.batchWrite [as commitWrites]
    (yarn-project/prover-client/src/proving_broker/proving_broker_database/persisted.ts:120:14)

The broker tries to commit the final reportProvingJobError write after the per-epoch LMDB store has already been closed (the test advances the epoch from 1 → 3, which causes the epoch-1 store to be torn down). The race is between the epoch advance / cleanup path and the final error write — a timing flake, not a logic bug.

Owner

Test was authored by @alexghr in #9400 (feat: new proving broker implementation) and most recently edited by @alexghr in #22508 (fix(prover-client): don't mark in-progress epoch N jobs as stale when epoch N+1 starts). @spypsy has also recently fixed retries-related races in this file (#21842, #22355). Pinging Alex as primary owner; tag Spyros if it's actually a retry-counter race rather than a store-lifecycle race.

Other branches

Spot-checked the most recent failed runs on merge-train/fairies and merge-train/spartan — none of them hit this same proving_broker / Store is closed failure in the data window I sampled. The flake has only been observed on the one wallet PR run linked above so far.

Pattern entry

The new entry uses both regex (test file path) and error_regex (does not retry if job is stale|Store is closed) so unrelated failures in proving_broker.test.ts still fail CI — only this specific timing race gets quarantined to a Slack ping.


Created by claudebox · group: aztec

@dbanks12 dbanks12 marked this pull request as ready for review May 7, 2026 16:25
@dbanks12 dbanks12 requested review from alexghr and spalladino May 7, 2026 16:25
@dbanks12 dbanks12 added this pull request to the merge queue May 7, 2026
Merged via the queue into next with commit 8a62f05 May 7, 2026
30 checks passed
@dbanks12 dbanks12 deleted the cb/b4b6eb63ff78 branch May 7, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants