feat: merge-train/spartan-v5#23965
Merged
Merged
Conversation
…#23952) Update comment on max checkpoint capacity
Always print the local-network service logs at the end of normal compose e2e tests, regardless of whether the test passed or failed. Changes: - Keep local-network output in /logs/local-network.log via the compose service tee. - Print those logs after the end-to-end test process exits for every local-network compose run. - Keep /logs in a per-project Docker named volume (local-network-logs) and remove it on teardown via REMOVE_COMPOSE_VOLUMES=1, so logs don't persist across runs. No host bind mount. - Remove the private_transfer.sh-specific LOCAL_NETWORK_LOG_LEVEL override and run the compose flows at LOG_LEVEL=verbose so the captured node logs are detailed enough for diagnostics. - Add a short flush window in run_compose_test after the container exits, so the trailing local-network logs aren't truncated by docker compose down. - Leave the proposed-chain mint behavior unchanged; there is no checkpoint wait in the mint helper. Example run: http://ci.aztec-labs.com/0245ece3810f8624
Fixes A-819 (Audit #164). ## Problem `RollupContract.getAttesters` (`yarn-project/ethereum/src/contracts/rollup.ts`) made several sequential RPC reads with no pinned block: - `getActiveAttesterCount()` (read at `latest`) - N chunked `getAttestersFromIndicesAtTime(...)` reads, one per 1000 indices (each read at `latest`) The `ts` timestamp argument was already captured once and reused consistently across chunks, so the literal "stale timestamp across chunks" framing of the title doesn't occur. The real defect is that the **reads are not pinned to a single L1 block**: across a block boundary or reorg, the count and the individual chunk reads can observe different attester sets, yielding an inconsistent or truncated result. This only bites for attester sets larger than the 1000-entry chunk size, read precisely across a set-changing block — hence low impact, but real. ## Fix Fetch the current block once in `getAttesters`, then thread its `number` as a `blockNumber` option through `getActiveAttesterCount` and every chunked `getAttestersFromIndicesAtTime` read so they all evaluate against the same L1 block. This follows the existing `checkBlockTag(options?.blockNumber, ...)` pattern already used by many reads in `rollup.ts` (e.g. `getCheckpointNumber`, `status`, `canPruneAtTime`). - `getActiveAttesterCount` and `GSEContract.getAttestersFromIndicesAtTime` now accept an optional `{ blockNumber }`. ## Testing Verified the full TypeScript build passes. No automated test added: reproducing the block-drift race deterministically would require anvil plus a hook to advance an L1 block between the count read and the chunk reads (or deep viem-client mocking), which isn't justified for this low-impact, pattern-following change. The block-pinning behavior mirrors other pinned reads in the same file.
) ## Summary Replaces the monolithic `EpochProvingJob` with a content-keyed `CheckpointStore`, a long-running `SessionManager` that owns ephemeral `EpochSession`s, and a new `ProofPublishingService` that centralises L1 submission. `ProverNode` becomes a thin event translator: each L2BlockStream event is applied to the store / chonk cache, then dispatched to the session manager and publishing service via single method calls. The redesign closes a class of optimistic-proving bugs that the old sticky `epochComplete` flag and per-session publish path made structurally hard to fix, and lays the groundwork for re-using sub-tree work across epochs. See `yarn-project/prover-node/README.md` for architecture diagrams, state machines and event-flow sequences. ### Architectural changes - **`CheckpointProver`** is content-addressed by `(number, slot, archiveRoot)`. A prune followed by a re-add of the same content (e.g. brief L1 reorg) reuses the in-flight sub-tree work — no replay. The `CheckpointProver` starts its own tx gather + sub-tree pipeline in its constructor; there's no `provideTxs` API. - **`CheckpointStore`** owns the registry, the `SlotWatcher` (a `RunningPromise` reaping pruned-past-slot `CheckpointProver`s), and `reapExpired` (drops canonical `CheckpointProver`s once their epoch's proof-submission window has closed, so the proof can no longer be accepted on L1). - **`EpochSession`** spec is slot-based: `[firstSlotOfEpoch(N), toSlot]`. Every session — full or partial — starts at the epoch's first slot because the L1 rollup requires every proof to extend from the previous proven tip. The session does three things: run a `TopTreeJob`, hand the proof to `ProofPublishingService` as a `PublishCandidate`, translate the outcome into a terminal state. Predecessor gating, same-epoch dedup, deadline enforcement, and the L1 tx are all the service's concern. - **`ProofPublishingService`** (new) is the single owner of L1 submission. It serialises one publish at a time against a freshly-created publisher, per-candidate `deadline` arms a `setTimeout` (resolves `'expired'` if it fires before publishing starts), persistent `publisherFactory.create()` failures are retried on a 1s backoff (capped by the deadline). Once an L1 publish starts it runs to completion; `withdraw` is queue-only. - **`SessionManager`** owns the `fullSessions` / `partialSessions` maps, the reconcile loop, **and** the periodic tick. Reconcile is uniform across kinds: any session whose canonical content shifts is cancelled and recreated with the same spec but new content. The tick high-water mark advances only after a session actually exists for the epoch, so transient blockers (max-pending-jobs reached, archiver still indexing) leave the mark in place and the next tick retries. - **`ChonkCache`** moved from per-epoch to a single prover-node-wide cache in `prover-client/orchestrator`, keyed by tx hash. Entries are released by the per-event expiry sweep (`releaseForBlocks`) once an epoch's proof-submission window has closed — there's no longer any proof to produce for those txs. - Reconcile and publishing-service drain each run on their own `SerialQueue` from `@aztec/foundation/queue` so concurrent events can't interleave on an `await` and race. ### Removed - `EpochProvingJob` and its sticky `epochComplete` flag, the `finalizationScheduled` flag, the in-class restart loop, the `'reorg'` state. - `ProvingOrchestrator` and `EpochProvingState` (test-only legacy); `CheckpointSubTreeOrchestrator` now extends `ProvingScheduler` directly. - `TopTreeProvingScheduler` collapsed into `TopTreeOrchestrator` (single concrete subclass). - `EpochProvingContext` (thin facade over `ChonkCache`); the sub-tree takes `ChonkCache` + `EpochNumber` directly. - `CheckpointParent` interface (vestige of `EpochProvingState` as a real parent); the per-checkpoint state takes three discrete `epochNumber`/`isAlive`/`onReject` deps from its owner. - `ProverNodePublisher.interrupt()` / `.restart()` and the entire mid-publish interrupt code path. The bug where `l1TxUtils.interrupted` leaked between publishes (the publisher is created fresh per publish but wraps a pooled `L1TxUtils`) is gone by construction. - `CheckpointStore.resume()` (dead code) and `implements Service` on `CheckpointStore`. - `ReconcileTrigger` variant `'finalised'` and `SessionManager.onChainFinalised` (redundant nudge; every reconcile already runs the `recreateInvalidSessions` sweep). - `ProofPublishingService.onPrune` (redundant with the session-manager path which already calls `withdraw(uuid)` for every cancelled session). ### Smaller fixes folded in - `tipsStore.handleBlockStreamEvent` moved to the `finally` block so a throwing handler doesn't claim progress that didn't happen. - Failure-upload snapshots every `CheckpointProver` regardless of sub-tree completion. - `ProverClient.stop()` uses `tryStop(facade)` to swallow already-stopped errors. - `lastExpiredEpoch` seeded from the last fully-proven epoch in `start()` so a restart never re-sweeps epochs that already reached L1. - `DateProvider` plumbed through `EpochSession`, `SessionManager`, `ProofPublishingService` — no direct `Date.now()` anywhere. - Branded types throughout: `Map<EpochNumber, ...>` on session-manager and publishing-service Maps; `TopTreeJob.getRange()` returns `CheckpointNumber`. ## Test plan - [x] `yarn workspace @aztec/prover-node test` — 161 unit tests pass (includes new `proof-publishing-service.test.ts`, `checkpoint-store.test.ts`). - [x] `yarn workspace @aztec/prover-client test src/orchestrator/ src/test/bb_prover_full_rollup.test.ts` with `FAKE_PROOFS=1` — 24 tests pass. - [x] `yarn workspace @aztec/stdlib test src/interfaces/prover-node.test.ts` — passes (state enum still includes legacy values for API compatibility). - [x] `yarn build` — full monorepo TypeScript build clean. - [x] `yarn format` / `yarn lint` clean across prover-node, prover-client, and end-to-end. - [ ] `yarn workspace @aztec/end-to-end test:e2e e2e_optimistic_proving` and `e2e_multi_proof` — exercised by CI. - [ ] Kind-mode network run with a synthetic L1 prune to confirm the cancel-and-recreate path lands a valid proof on L1.
PhilWindle
approved these changes
Jun 9, 2026
Collaborator
Author
Flakey Tests🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BEGIN_COMMIT_OVERRIDE
docs(stdlib): clarify checkpoint capacity ceiling is the provable max (#23952)
test: always capture local network logs for compose tests (#23912)
fix: pin getAttesters reads to a single L1 block (A-819) (#23920)
refactor(prover-node): CheckpointStore + SessionManager redesign (#23552)
END_COMMIT_OVERRIDE