refactor(prover-node): address review feedback on checkpoint-store redesign#23972
Merged
PhilWindle merged 1 commit intoJun 9, 2026
Merged
Conversation
…design
- prover-node: advance the local tips store only after block-stream handling
succeeds, and propagate checkpoint registration failures so failed events are
retried by the L2BlockStream
- prover-node: skip checkpoints for epochs past their proof-submission window
- prover-node: bound the publishing-service shutdown with a timeout
- prover-node: expireEpoch fetches blocks via getBlocks({ epoch })
- session-manager: compare archiver coverage by content-addressed id (not
number), reuse a live partial session whose content matches, and refuse to
re-prove an epoch the proven chain already encompasses
- session-manager: collapse the redundant proving/finalization delays into one
- startProof: return the job id without awaiting completion (callers poll
getJobs)
- proof-publishing-service: wire the candidate deadline into l1-tx-utils
- prover-node-publisher: drop the now-redundant wait-for-proven loop
- orchestrator: abort the sub-tree when the chonk verifier proof fails, and
cancel the proving state on sub-tree cancel for parity with the top tree
- orchestrator: name BaseRollupHintsWithoutProofAndVK and correct the
proving-scheduler throttle docs
- e2e: anchor the mid-epoch reorg test on a fresh epoch and assert it exercises
in-epoch checkpoint removal (N checkpoints, remove the last, prove N-1)
PhilWindle
added a commit
that referenced
this pull request
Jun 9, 2026
…design (#23972) Addresses the review feedback on the checkpoint-store redesign (PR #23552). Branches off `pw/checkpoint-store-redesign` so it can be folded back in. ### prover-node - Advance the local tips store only after block-stream handling succeeds, and propagate checkpoint registration failures instead of swallowing them, so a transient handler failure leaves the tips unadvanced and the `L2BlockStream` re-emits the event (A-1041). - Skip checkpoints for epochs already past their proof-submission window (avoids archiver catch-up noise). - Bound the publishing-service shutdown with a timeout so an in-flight L1 tx can't hang `stop()`. - `expireEpoch` fetches blocks via `getBlocks({ epoch })` instead of computing the range from `getCheckpointsData`. ### session-manager - `archiverFullyCovered` compares by content-addressed id `(number, slot, archive root)` rather than checkpoint number, so a reorg that keeps the number but changes content is detected. - `openPartialSession` reuses a live partial session whose checkpoint content already matches the canonical set instead of reconstructing one. - `startProof` refuses to re-prove an epoch the L1 proven chain already encompasses. - Collapsed the redundant proving/finalization delays into a single delay (they gated identical work). - `startProof` returns the job id without awaiting completion; callers poll `getJobs()` (proving can outlast an HTTP request). RPC output changes `void` → `string`. ### proof publishing - Wire the candidate's submission-window deadline into `l1-tx-utils` (`txTimeoutAt`) so the L1 tx stops retrying past the deadline. - Drop the now-redundant `waitUntilStartBuildsOnProven` loop from `ProverNodePublisher` — the publishing service already gates publication on the predecessor being proven. ### orchestrator - Abort the sub-tree when the chonk verifier proof fails (otherwise the checkpoint/epoch orchestrators hang forever). - Cancel the proving state on sub-tree cancel, matching `TopTreeOrchestrator`. - Name `BaseRollupHintsWithoutProofAndVK` at the `prepareBaseRollupInputs` boundary and correct the stale "mock proof and VK" comment. - Clarify the proving-scheduler throttle docs: the shared `SerialQueue` paces job initiation, it does not bound in-flight broker concurrency. ### e2e - Anchor the mid-epoch reorg test on a fresh epoch and add count-based assertions so it provably exercises in-epoch checkpoint removal: N checkpoints in the epoch, remove the last leaving N-1, prove up to and including the (N-1)th. Prover-node unit tests (162) and prover-client orchestrator tests pass; build and lint clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses the review feedback on the checkpoint-store redesign (PR #23552). Branches off
pw/checkpoint-store-redesignso it can be folded back in.prover-node
L2BlockStreamre-emits the event (A-1041).stop().expireEpochfetches blocks viagetBlocks({ epoch })instead of computing the range fromgetCheckpointsData.session-manager
archiverFullyCoveredcompares by content-addressed id(number, slot, archive root)rather than checkpoint number, so a reorg that keeps the number but changes content is detected.openPartialSessionreuses a live partial session whose checkpoint content already matches the canonical set instead of reconstructing one.startProofrefuses to re-prove an epoch the L1 proven chain already encompasses.startProofreturns the job id without awaiting completion; callers pollgetJobs()(proving can outlast an HTTP request). RPC output changesvoid→string.proof publishing
l1-tx-utils(txTimeoutAt) so the L1 tx stops retrying past the deadline.waitUntilStartBuildsOnProvenloop fromProverNodePublisher— the publishing service already gates publication on the predecessor being proven.orchestrator
TopTreeOrchestrator.BaseRollupHintsWithoutProofAndVKat theprepareBaseRollupInputsboundary and correct the stale "mock proof and VK" comment.SerialQueuepaces job initiation, it does not bound in-flight broker concurrency.e2e
Prover-node unit tests (162) and prover-client orchestrator tests pass; build and lint clean.