Skip to content

refactor(prover-node): address review feedback on checkpoint-store redesign#23972

Merged
PhilWindle merged 1 commit into
pw/checkpoint-store-redesignfrom
pw/checkpoint-store-review-followups
Jun 9, 2026
Merged

refactor(prover-node): address review feedback on checkpoint-store redesign#23972
PhilWindle merged 1 commit into
pw/checkpoint-store-redesignfrom
pw/checkpoint-store-review-followups

Conversation

@PhilWindle

Copy link
Copy Markdown
Collaborator

Addresses the review feedback on the checkpoint-store redesign (PR #23552). Branches off pw/checkpoint-store-redesign so it can be folded back in.

prover-node

  • Advance the local tips store only after block-stream handling succeeds, and propagate checkpoint registration failures instead of swallowing them, so a transient handler failure leaves the tips unadvanced and the L2BlockStream re-emits the event (A-1041).
  • Skip checkpoints for epochs already past their proof-submission window (avoids archiver catch-up noise).
  • Bound the publishing-service shutdown with a timeout so an in-flight L1 tx can't hang stop().
  • expireEpoch fetches blocks via getBlocks({ epoch }) instead of computing the range from getCheckpointsData.

session-manager

  • archiverFullyCovered compares by content-addressed id (number, slot, archive root) rather than checkpoint number, so a reorg that keeps the number but changes content is detected.
  • openPartialSession reuses a live partial session whose checkpoint content already matches the canonical set instead of reconstructing one.
  • startProof refuses to re-prove an epoch the L1 proven chain already encompasses.
  • Collapsed the redundant proving/finalization delays into a single delay (they gated identical work).
  • startProof returns the job id without awaiting completion; callers poll getJobs() (proving can outlast an HTTP request). RPC output changes voidstring.

proof publishing

  • Wire the candidate's submission-window deadline into l1-tx-utils (txTimeoutAt) so the L1 tx stops retrying past the deadline.
  • Drop the now-redundant waitUntilStartBuildsOnProven loop from ProverNodePublisher — the publishing service already gates publication on the predecessor being proven.

orchestrator

  • Abort the sub-tree when the chonk verifier proof fails (otherwise the checkpoint/epoch orchestrators hang forever).
  • Cancel the proving state on sub-tree cancel, matching TopTreeOrchestrator.
  • Name BaseRollupHintsWithoutProofAndVK at the prepareBaseRollupInputs boundary and correct the stale "mock proof and VK" comment.
  • Clarify the proving-scheduler throttle docs: the shared SerialQueue paces job initiation, it does not bound in-flight broker concurrency.

e2e

  • Anchor the mid-epoch reorg test on a fresh epoch and add count-based assertions so it provably exercises in-epoch checkpoint removal: N checkpoints in the epoch, remove the last leaving N-1, prove up to and including the (N-1)th.

Prover-node unit tests (162) and prover-client orchestrator tests pass; build and lint clean.

…design

- prover-node: advance the local tips store only after block-stream handling
  succeeds, and propagate checkpoint registration failures so failed events are
  retried by the L2BlockStream
- prover-node: skip checkpoints for epochs past their proof-submission window
- prover-node: bound the publishing-service shutdown with a timeout
- prover-node: expireEpoch fetches blocks via getBlocks({ epoch })
- session-manager: compare archiver coverage by content-addressed id (not
  number), reuse a live partial session whose content matches, and refuse to
  re-prove an epoch the proven chain already encompasses
- session-manager: collapse the redundant proving/finalization delays into one
- startProof: return the job id without awaiting completion (callers poll
  getJobs)
- proof-publishing-service: wire the candidate deadline into l1-tx-utils
- prover-node-publisher: drop the now-redundant wait-for-proven loop
- orchestrator: abort the sub-tree when the chonk verifier proof fails, and
  cancel the proving state on sub-tree cancel for parity with the top tree
- orchestrator: name BaseRollupHintsWithoutProofAndVK and correct the
  proving-scheduler throttle docs
- e2e: anchor the mid-epoch reorg test on a fresh epoch and assert it exercises
  in-epoch checkpoint removal (N checkpoints, remove the last, prove N-1)
@PhilWindle PhilWindle merged commit 6a70e4b into pw/checkpoint-store-redesign Jun 9, 2026
12 checks passed
@PhilWindle PhilWindle deleted the pw/checkpoint-store-review-followups branch June 9, 2026 17:32
PhilWindle added a commit that referenced this pull request Jun 9, 2026
…design (#23972)

Addresses the review feedback on the checkpoint-store redesign (PR
#23552). Branches off `pw/checkpoint-store-redesign` so it can be folded
back in.

### prover-node
- Advance the local tips store only after block-stream handling
succeeds, and propagate checkpoint registration failures instead of
swallowing them, so a transient handler failure leaves the tips
unadvanced and the `L2BlockStream` re-emits the event (A-1041).
- Skip checkpoints for epochs already past their proof-submission window
(avoids archiver catch-up noise).
- Bound the publishing-service shutdown with a timeout so an in-flight
L1 tx can't hang `stop()`.
- `expireEpoch` fetches blocks via `getBlocks({ epoch })` instead of
computing the range from `getCheckpointsData`.

### session-manager
- `archiverFullyCovered` compares by content-addressed id `(number,
slot, archive root)` rather than checkpoint number, so a reorg that
keeps the number but changes content is detected.
- `openPartialSession` reuses a live partial session whose checkpoint
content already matches the canonical set instead of reconstructing one.
- `startProof` refuses to re-prove an epoch the L1 proven chain already
encompasses.
- Collapsed the redundant proving/finalization delays into a single
delay (they gated identical work).
- `startProof` returns the job id without awaiting completion; callers
poll `getJobs()` (proving can outlast an HTTP request). RPC output
changes `void` → `string`.

### proof publishing
- Wire the candidate's submission-window deadline into `l1-tx-utils`
(`txTimeoutAt`) so the L1 tx stops retrying past the deadline.
- Drop the now-redundant `waitUntilStartBuildsOnProven` loop from
`ProverNodePublisher` — the publishing service already gates publication
on the predecessor being proven.

### orchestrator
- Abort the sub-tree when the chonk verifier proof fails (otherwise the
checkpoint/epoch orchestrators hang forever).
- Cancel the proving state on sub-tree cancel, matching
`TopTreeOrchestrator`.
- Name `BaseRollupHintsWithoutProofAndVK` at the
`prepareBaseRollupInputs` boundary and correct the stale "mock proof and
VK" comment.
- Clarify the proving-scheduler throttle docs: the shared `SerialQueue`
paces job initiation, it does not bound in-flight broker concurrency.

### e2e
- Anchor the mid-epoch reorg test on a fresh epoch and add count-based
assertions so it provably exercises in-epoch checkpoint removal: N
checkpoints in the epoch, remove the last leaving N-1, prove up to and
including the (N-1)th.

Prover-node unit tests (162) and prover-client orchestrator tests pass;
build and lint clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant