fix: indefinite retry for prover node and agent broker communication#22202
Merged
PhilWindle merged 3 commits intoApr 1, 2026
Conversation
… retry The broker-side max retry cap was not requested to be changed. Only the HTTP-level retry for prover node and agent communication with the broker should be indefinite.
…xponential Replace the generic backoffGenerator (caps at 64s) with a broker-specific backoff sequence: 1, 1, 1, 2, 4 then continuously 4s. Also refactor makeTracedFetch to accept a backoff factory function for custom generators.
alexghr
approved these changes
Mar 31, 2026
PhilWindle
approved these changes
Apr 1, 2026
AztecBot
added a commit
that referenced
this pull request
Apr 1, 2026
…22202) ## Summary Changes the HTTP-level retry mechanism for prover node and agent communication with the prover broker from limited retries to indefinite backoff: - **Prover node → broker** (`start_node.ts`): Changed from finite retry array `[1, 2, 3, 3, ...]` (~30s) to indefinite backoff `[1, 1, 1, 2, 4, 4, 4, ...]` - **Prover agent → broker** (`start_prover_agent.ts`): Same change - **Default broker RPC clients** (`rpc.ts`): Updated defaults to use the new `proverBrokerBackoff` generator - **`makeTracedFetch`** (`fetch.ts`): Now accepts either a `number[]` for finite backoff or a `() => Generator<number>` factory for indefinite backoff The rationale is that the epoch proving has its own timeout — when it expires, the chain reorgs and jobs can be safely cancelled. There's no reason for the HTTP communication layer to give up before that happens. ## Test plan - [x] All 98 proving broker tests pass - [x] Build succeeds - [ ] Verify in spartan that prover node and agent reconnect to broker after transient failures"
Collaborator
Author
|
✅ Successfully backported to backport-to-v4-next-staging #22205. |
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 1, 2026
BEGIN_COMMIT_OVERRIDE chore: (A-771) remove dead code, verify keypair (#22167) fix(aes128): validate PKCS#7 padding in decryptBufferCBC (#22179) chore: (A-815) fix l1 tx utils fallback id logic (#22187) fix(archiver): always advance L1-to-L2 messages syncpoint to current L1 block (#22154) chore: (A-832) fix defaultFetch double consuming response on JSON parse failure (#22194) fix: indefinite retry for prover node and agent broker communication (#22202) fix: remove unused createDispatchFn with no method allowlist (#22219) chore: fix wallet setup to use NO_FROM instead of ZERO address (#22222) fix: update aes128 bad-key test for PKCS#7 padding validation (#22190) END_COMMIT_OVERRIDE
AztecBot
added a commit
that referenced
this pull request
Apr 1, 2026
BEGIN_COMMIT_OVERRIDE cherry-pick: feat: move event size check from declaration to private emission (#22168) fix: prevent oracle failure on tag computation for invalid recipient (#22163) feat: move event size check from declaration to private emission (#22168) [v4-next backport] (#22182) fix(cli-wallet): peek claim stack instead of popping for estimate-gas-only (#22196) fix: use Fr.fromString for CLI wallet claim params to handle decimal values (#22197) fix: indefinite retry for prover node and agent broker communication (#22202) END_COMMIT_OVERRIDE
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes the HTTP-level retry mechanism for prover node and agent communication with the prover broker from limited retries to indefinite backoff:
start_node.ts): Changed from finite retry array[1, 2, 3, 3, ...](~30s) to indefinite backoff[1, 1, 1, 2, 4, 4, 4, ...]start_prover_agent.ts): Same changerpc.ts): Updated defaults to use the newproverBrokerBackoffgeneratormakeTracedFetch(fetch.ts): Now accepts either anumber[]for finite backoff or a() => Generator<number>factory for indefinite backoffThe rationale is that the epoch proving has its own timeout — when it expires, the chain reorgs and jobs can be safely cancelled. There's no reason for the HTTP communication layer to give up before that happens.
Test plan