Skip to content

fix: indefinite retry for prover node and agent broker communication#22202

Merged
PhilWindle merged 3 commits into
merge-train/spartanfrom
claudebox/indefinite-prover-retry
Apr 1, 2026
Merged

fix: indefinite retry for prover node and agent broker communication#22202
PhilWindle merged 3 commits into
merge-train/spartanfrom
claudebox/indefinite-prover-retry

Conversation

@AztecBot

@AztecBot AztecBot commented Mar 31, 2026

Copy link
Copy Markdown
Collaborator

Summary

Changes the HTTP-level retry mechanism for prover node and agent communication with the prover broker from limited retries to indefinite backoff:

  • Prover node → broker (start_node.ts): Changed from finite retry array [1, 2, 3, 3, ...] (~30s) to indefinite backoff [1, 1, 1, 2, 4, 4, 4, ...]
  • Prover agent → broker (start_prover_agent.ts): Same change
  • Default broker RPC clients (rpc.ts): Updated defaults to use the new proverBrokerBackoff generator
  • makeTracedFetch (fetch.ts): Now accepts either a number[] for finite backoff or a () => Generator<number> factory for indefinite backoff

The rationale is that the epoch proving has its own timeout — when it expires, the chain reorgs and jobs can be safely cancelled. There's no reason for the HTTP communication layer to give up before that happens.

Test plan

  • All 98 proving broker tests pass
  • Build succeeds
  • Verify in spartan that prover node and agent reconnect to broker after transient failures"

@AztecBot AztecBot added ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels Mar 31, 2026
… retry

The broker-side max retry cap was not requested to be changed.
Only the HTTP-level retry for prover node and agent communication
with the broker should be indefinite.
…xponential

Replace the generic backoffGenerator (caps at 64s) with a broker-specific
backoff sequence: 1, 1, 1, 2, 4 then continuously 4s. Also refactor
makeTracedFetch to accept a backoff factory function for custom generators.
@alexghr alexghr marked this pull request as ready for review March 31, 2026 22:29
@PhilWindle PhilWindle merged commit 063f9ef into merge-train/spartan Apr 1, 2026
23 of 31 checks passed
@PhilWindle PhilWindle deleted the claudebox/indefinite-prover-retry branch April 1, 2026 09:19
AztecBot added a commit that referenced this pull request Apr 1, 2026
…22202)

## Summary

Changes the HTTP-level retry mechanism for prover node and agent
communication with the prover broker from limited retries to indefinite
backoff:

- **Prover node → broker** (`start_node.ts`): Changed from finite retry
array `[1, 2, 3, 3, ...]` (~30s) to indefinite backoff `[1, 1, 1, 2, 4,
4, 4, ...]`
- **Prover agent → broker** (`start_prover_agent.ts`): Same change
- **Default broker RPC clients** (`rpc.ts`): Updated defaults to use the
new `proverBrokerBackoff` generator
- **`makeTracedFetch`** (`fetch.ts`): Now accepts either a `number[]`
for finite backoff or a `() => Generator<number>` factory for indefinite
backoff

The rationale is that the epoch proving has its own timeout — when it
expires, the chain reorgs and jobs can be safely cancelled. There's no
reason for the HTTP communication layer to give up before that happens.

## Test plan

- [x] All 98 proving broker tests pass
- [x] Build succeeds
- [ ] Verify in spartan that prover node and agent reconnect to broker
after transient failures"
@AztecBot

AztecBot commented Apr 1, 2026

Copy link
Copy Markdown
Collaborator Author

✅ Successfully backported to backport-to-v4-next-staging #22205.

github-merge-queue Bot pushed a commit that referenced this pull request Apr 1, 2026
BEGIN_COMMIT_OVERRIDE
chore: (A-771) remove dead code, verify keypair (#22167)
fix(aes128): validate PKCS#7 padding in decryptBufferCBC (#22179)
chore: (A-815) fix l1 tx utils fallback id logic (#22187)
fix(archiver): always advance L1-to-L2 messages syncpoint to current L1
block (#22154)
chore: (A-832) fix defaultFetch double consuming response on JSON parse
failure (#22194)
fix: indefinite retry for prover node and agent broker communication
(#22202)
fix: remove unused createDispatchFn with no method allowlist (#22219)
chore: fix wallet setup to use NO_FROM instead of ZERO address (#22222)
fix: update aes128 bad-key test for PKCS#7 padding validation (#22190)
END_COMMIT_OVERRIDE
AztecBot added a commit that referenced this pull request Apr 1, 2026
BEGIN_COMMIT_OVERRIDE
cherry-pick: feat: move event size check from declaration to private
emission (#22168)
fix: prevent oracle failure on tag computation for invalid recipient
(#22163)
feat: move event size check from declaration to private emission
(#22168) [v4-next backport] (#22182)
fix(cli-wallet): peek claim stack instead of popping for
estimate-gas-only (#22196)
fix: use Fr.fromString for CLI wallet claim params to handle decimal
values (#22197)
fix: indefinite retry for prover node and agent broker communication
(#22202)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-v4-next ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants