feat: merge-train/spartan#22980
Merged
Merged
Conversation
## Summary - Keep `getPublicIp()` at startup so the ENR always has a valid IP from the start - Enable discv5 `enrUpdate` with `addrVotesToUpdateEnr: 1` and faster pings (10s) when `queryForIp` is enabled, so PONG votes can correct the IP at runtime if it changes (e.g. residential ISP, Cloud NAT rotation) - Bridge discv5 IP changes to libp2p's AddressManager so peers see updated addresses - Have the bootnode explicitly `addEnr()` on discovery to fix routing table gaps where nodes were never inserted - Improve P2P observability: log KAD table state in peer manager heartbeats, log ENR additions with multiaddrs, log config at startup - Small change to deploy scripts that allows us to define a full aztec image to deploy on a network rather than just `aztecprotcool/aztec:<tag>` Fixes [A-310](https://linear.app/aztec-labs/issue/A-310/p2p-query-for-ip-should-detect-ip-changes) Co-authored-by: Alex Gherghisan <alexghr@users.noreply.github.com> Co-authored-by: danielntmd <162406516+danielntmd@users.noreply.github.com>
…2967) ## Motivation The `e2e_epochs/epochs_missed_l1_publish` test fails intermittently when its proposer-discovery scan looks too far into the future. The L1 rollup contract reverts with `ValidatorSelection__EpochNotStable` for any epoch whose randao sample timestamp is still ahead of `block.timestamp`, and the test was scanning up to 60 slots (~15 epochs at the test's epoch duration) ahead, well past the queryable horizon. ## Approach Wrap the proposer scan in a retry loop that catches `EpochNotStable`, warps L1 forward by one epoch, and re-queries the same candidate. After each warp the scan also re-anchors the candidate to keep the +4 slot margin from the new "now", so subsequent steps (the warp to `slotZero` and sequencer start-up) still have headroom. ## Changes - **end-to-end (tests)**: Replace the bounded `for` loop in `epochs_missed_l1_publish.test.ts` with a try/catch retry that warps L1 on `EpochNotStable`.
These sequencer errors were ignored in some tests. Removing that since this error should not happen. If it does, it's cause for analysis.
Enable pipelining on `epochs_first_slot` and `simple_block_building`
Had been accidentally introduced in #22759
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
… objects (#22933) ## Motivation Clean up the checkpoint side of `L2BlockSource`. PR #22809 already collapsed the block-side API into 4 query-shaped methods over 2 return types; the checkpoint surface was left with the pre-refactor sprawl (9 narrow methods over 4 return shapes, parallel by-number / by-range / by-epoch entrypoints, and a wire-level alias that conflated proposed and confirmed checkpoints). This change applies the same simplification. Fixes A-979 ## Approach `L2BlockSource` checkpoint methods reduce to 4 query-shaped readers (`getCheckpoint`, `getCheckpoints`, `getCheckpointData`, `getCheckpointsData`) over 2 return shapes (`PublishedCheckpoint`, `CheckpointData`), plus a polymorphic `getProposedCheckpointData(query?)` for the proposed-only path. Three new query types live next to `BlockQuery`/`BlocksQuery`. On-disk format and `BlockStore` primitives are unchanged — the simplification is at the API boundary. The public RPC's `getCheckpoint` keeps the same wire signature but gains a confirmed→proposed fallback (for `{number}`/`{slot}`/`'proposed'` lookups) and `BadRequestError` guards for incompatible `include*` flags. ## API surface change ### Methods removed from `L2BlockSource` `getCheckpoints(from, limit)`, `getCheckpointData(n)`, `getCheckpointDataRange(from, limit)`, `getCheckpointsForEpoch(epoch)`, `getCheckpointsDataForEpoch(epoch)`, `getCheckpointNumberBySlot(slot)`, `getLastCheckpoint()`, `getLastProposedCheckpoint()`. Dead methods on `data_source_base` also removed: `getCheckpointHeader`, `getLastBlockNumberInCheckpoint`, `getSynchedCheckpointNumber`. ### Methods added to `L2BlockSource` ```ts getCheckpoint(query: CheckpointQuery): Promise<PublishedCheckpoint | undefined> getCheckpoints(query: CheckpointsQuery): Promise<PublishedCheckpoint[]> getCheckpointData(query: CheckpointQuery): Promise<CheckpointData | undefined> getCheckpointsData(query: CheckpointsQuery): Promise<CheckpointData[]> getProposedCheckpointData(query?: ProposedCheckpointQuery): Promise<ProposedCheckpointData | undefined> type CheckpointQuery = { number } | { slot } | { tag: 'checkpointed' | 'proven' | 'finalized' } type CheckpointsQuery = { from, limit } | { epoch } type ProposedCheckpointQuery = { number } | { slot } | { tag: 'proposed' } ``` ### Public RPC (`AztecNode`) wire-level changes - `getCheckpointsDataForEpoch(epoch)` removed; `getCheckpointsData(query: CheckpointsQuery)` added (range or epoch). - `'latest'` removed from `CheckpointParameter`. - `'proposed'` semantics changed: previously aliased to "latest L1-confirmed checkpoint" (a documented foot-gun); now `getCheckpoint('proposed')` strictly targets the proposed-checkpoint store, and `getCheckpointNumber('proposed')` returns the proposed-tip number with confirmed fallback. - `getCheckpoint({ number }) / ({ slot })` now check confirmed first then fall back to proposed; tag-based lookups (`'checkpointed'` / `'proven'` / `'finalized'`) do not fall back. - `getCheckpoint('proposed', { includeL1PublishInfo: true | includeAttestations: true })` and the same flags on a by-number/by-slot lookup that resolves to a proposed entry now throw `BadRequestError` (proposed checkpoints have no L1 publish info or attestations). ### Types kept `CheckpointData`, `CommonCheckpointData` (structural base of `CheckpointData` / `ProposedCheckpointInput`), `ProposedCheckpointData`, `ProposedCheckpointInput`, `PublishedCheckpoint`, `Checkpoint`. No structural-type deletions. Migration guidance for wallet/SDK consumers is in `docs/docs-developers/docs/resources/migration_notes.md`. ## Changes - **stdlib**: New query types (`CheckpointQuery`, `CheckpointsQuery`, `ProposedCheckpointQuery`) + Zod schemas in `block/l2_block_source.ts`. `'latest'` literal removed from `interfaces/checkpoint_parameter.ts`. `NormalizedCheckpointDispatch` type for the server's parameter normalizer. `ArchiverApiSchema` and `AztecNode` schema updated. `computeL2ToL1MembershipWitness` switched to the new query shape. - **archiver**: `data_source_base` adds `resolveCheckpointQuery` / `resolveCheckpointsQuery` mirroring the block-side helpers, implements the 4 confirmed methods plus the polymorphic proposed lookup. `BlockStore` adds `getProposedCheckpointBySlot(slot)`. `MockArchiver` and `mock_l2_block_source` updated to match the new interface. - **aztec-node**: `server.ts` adds the confirmed→proposed fallback flow with the two `BadRequestError` guards in `getCheckpoint`, sources all tips from a single `getL2Tips()` call in `getCheckpointNumber`, and routes the public RPC through the new internal methods. New pure-projection helper `projectProposedToCheckpointResponse` in `block_response_helpers.ts`. - **consumer migrations**: prover-node (collapses two checkpoint fetches into one `getCheckpoints({ epoch })`), world-state, slasher, sequencer (`checkpoint_proposal_job`, `sequencer`), validator (`proposal_handler`), `L2BlockStream`, pxe `block_stream_source`, telemetry wrapper, and 10 e2e files updated to the new query shapes. - **tests**: 48 new `it()` blocks covering each query discriminant, the throw guards, the confirmed→proposed fallback, the polymorphic `getProposedCheckpointData` dispatch, and `BlockStore.getProposedCheckpointBySlot`. - **docs**: `migration_notes.md` updated with the breaking changes for downstream wallet/SDK consumers.
…oposal check (#22989) ## Motivation `hasPayloadBeenProposed` (now `hasActiveProposalWithPayload`) used `eth_getLogs` over the rollup's full L1 deployment range to find prior `PayloadSubmitted` events. On long-lived rollups that range exceeds typical RPC provider block-range caps and the call times out, silently breaking the sequencer's "stop signaling for an already-proposed payload" logic. The previous in-memory cache also permanently blacklisted any payload it saw as proposed once, which is wrong: each round on `EmpireBase` is independent and the same payload can legitimately be re-signaled and re-submitted after a prior proposal becomes Dropped/Rejected/Expired/Executed. ## Approach Replace the log scan with a bounded view-call sweep over `Governance.proposals`. The sweep walks newest -> oldest using `proposalCount`, unwraps each proposal's `GSEPayload` via `getOriginalPayload()`, and treats only `Pending`/`Active`/`Queued`/`Executable` as "in an active proposal" -- terminal states allow re-signaling. The descent has a hard early-stop on the protocol-wide proposal lifetime cap (`4 * ConfigurationLib.TIME_UPPER = 360 days`), which is safe regardless of per-proposal frozen configs because every config field is bounded by `TIME_UPPER` on-chain. Two in-memory caches absorb the per-call cost over time: terminal proposals (provably immutable on-chain) and wrapper -> original payload unwraps (immutable bytecode). ## Changes - **ethereum/contracts/governance**: New `hasActiveProposalWithPayload(payload)` and `getProposalCount()` on `ReadOnlyGovernanceContract`. Inlines a minimal `IProposerPayload` ABI (just `getOriginalPayload`) to avoid generating a full artifact. Handles `proposeWithLock`-style proposals (no GSEPayload wrapper) by catching the unwrap revert and skipping. - **ethereum/contracts/governance (types)**: Adds explicit types (`Proposal`, `ProposalConfiguration`, `GovernanceConfiguration`, `ProposeWithLockConfiguration`, `Ballot`) and maps the viem return shapes of `getProposal` / `getConfiguration` onto them. `Proposal` now carries both `cachedState` (raw stored) and `state` (live, time-derived from `getProposalState`); `getProposal` issues both reads in parallel so callers don't need a separate state RPC. - **ethereum/contracts/governance (caching)**: Adds two memoization layers on `ReadOnlyGovernanceContract`. Proposals are cached when `state` is in any of the four terminal phases (Executed/Rejected/Dropped/Expired) -- once terminal the entire struct is provably immutable on-chain. Wrapper unwraps are keyed by wrapper address and cached forever (deployed bytecode is immutable). `GovernanceProposerContract` already memoizes its `getGovernance()`, so the same `ReadOnlyGovernanceContract` instance (and its caches) is reused across slots in the sequencer publisher. - **ethereum/contracts/governance_proposer**: Drops the event-based `hasPayloadBeenProposed`. Adds a memoized `getGovernance()` accessor and a thin `hasActiveProposalWithPayload` delegate that resolves the Governance address via the on-chain registry lookup. - **ethereum/contracts/empire_base**: Removes `hasPayloadBeenProposed` from `IEmpireBase` -- it's a Governance concern, not a generic empire concern (slasher doesn't need it). - **sequencer-client/publisher**: Removes the permanent `payloadProposedCache` so the publisher re-checks every slot, allowing re-signaling once a prior proposal is terminal. Switches the failure mode from fail-closed to fail-open (a flaky L1 endpoint should not silence governance participation; a duplicate signal is harmless). Narrows the helper's `base` param from `IEmpireBase` to `GovernanceProposerContract` since this code path is governance-only. - **ethereum/contracts (tests)**: New `hasActiveProposalWithPayload` describe block hitting a real anvil-deployed Governance. Impersonates the `governanceProposer`, calls `Governance.propose` directly, and etches hand-rolled mock wrapper bytecode at chosen addresses to drive (wrapper, original) pairs. Covers: empty governance, live match, no match, terminal state via warp, reverting wrapper (proposeWithLock-style), descent past unrelated proposals, case-insensitive match, and the 360-day hard cutoff via warp. Also adds a sync-guard describe block that probes `Governance.updateConfiguration` via impersonated `eth_call` to assert each of `votingDelay`/`votingDuration`/`executionDelay`/`gracePeriod` accepts `TIME_UPPER` and rejects `TIME_UPPER + 1` -- if those caps change on-chain, this trips and `MAX_PROPOSAL_LIFETIME_SECONDS` must be revisited. - **sequencer-client/publisher (tests)**: Replaces the cache test with a "re-checks each call so re-signaling resumes after terminal" test. Updates the RPC-failure semantics test from fail-closed to fail-open.
…ile in CI (#23000) ## Summary Fixes the `docs` build failure on `merge-train/spartan` (CI run [25449092262](https://github.com/AztecProtocol/aztec-packages/actions/runs/25449092262), log [27a4351a1e5e3568](http://ci.aztec-labs.com/27a4351a1e5e3568)). ## Problem `validate-webapp-tutorial` in `docs/examples/bootstrap.sh` intentionally starts each run with an empty `yarn.lock`, then runs `yarn install` to populate it from the `link:` paths it just wrote into `package.json`. In CI, Yarn 4 auto-enables `--immutable` when it detects `CI=1`, so the install fails with `YN0028 (frozen lockfile exception)` because populating an empty lockfile counts as modifying it. ``` ➤ YN0028: │ The lockfile would have been modified by this install, which is explicitly forbidden. ➤ YN0000: · Failed with errors in 6s 829ms ERROR: Contract artifact not found at /home/aztec-dev/aztec-packages/docs/target/pod_racing_contract-PodRacing.json ``` (The "Contract artifact not found" line is a downstream symptom — the script doesn't run with `set -e`, so after `yarn install` fails it continues into the artifact check and reports a misleading error.) ## Fix Set `YARN_ENABLE_IMMUTABLE_INSTALLS=false` for that one `yarn install` call, since populating the lockfile is the intended behaviour. ## Verification Reproduced locally: `CI=true yarn install` against the webapp-tutorial fails with `YN0028`; with `YARN_ENABLE_IMMUTABLE_INSTALLS=false` it succeeds. ClaudeBox log: https://claudebox.work/s/a1863de35053b544?run=1
Collaborator
Author
|
🤖 Auto-merge enabled after 4 hours of inactivity. This PR will be merged automatically once all checks pass. |
…ts (#23009) No major changes needed
…22994) ## Motivation The `aztec.archiver.block_height` series with no status attribute (rendered as the "Pending chain" line on the network, prover, and fisherman Grafana dashboards) stopped being published a couple of weeks ago. With pipelining enabled every checkpoint arriving from L1 already has its blocks in the proposed store, so the L1 synchronizer always took the new promotion fast path introduced in #22716, leaving `checkpointsToAdd` empty and skipping the metric call. ## Approach Record the checkpointed block-height metrics across all valid checkpoints in the batch instead of only the ones routed through `addCheckpoints`, so the promoted checkpoint contributes too. The duration is averaged over the full batch since `addCheckpoints` performs the work for both paths in a single transaction. ## Changes - **archiver (`l1_synchronizer.ts`)**: Move the `processNewCheckpointedBlocks` call to use `validCheckpoints` rather than `checkpointsToAdd`, restoring the empty-status `block_height`, `checkpoint_height`, `sync_block_count`, and `sync_per_checkpoint` series under pipelining. --------- Co-authored-by: Alex Gherghisan <alexghr@users.noreply.github.com>
…23186) ## Motivation The top-level `--archiver` flag was removed from `aztec start`, but several scripts, Helm/Terraform values, and docs still pass it. Leaving these in place would break node and prover startup once they pick up the new CLI. ## Approach Grepped the repo for bare `--archiver` (excluding nested `--archiver.<option>` flags, which are still valid) and removed every occurrence from start commands, docs, and the bot CLI handler. Also dropped the now-stale check for an `archiver` option in `start_bot.ts` and a stray comment in `aztec_start_options.ts`. ## Changes - **docker-compose.yml**: drop `--archiver` from the node entrypoint - **spartan (helm + terraform values)**: remove `--archiver` from `aztec-node`, `aztec-validator`, `aztec-prover-stack`, and the `full-node`, `rpc`, `archive`, `blob-sink` terraform values; update `aztec-node/README.md` examples and options table - **yarn-project/aztec**: drop `archiver` from the unsupported-flags check in `start_bot.ts`; remove stale comment in `aztec_start_options.ts` - **docs/docs-operate**: drop `--archiver` from the node/prover/sequencer setup, troubleshooting, and CLI reference pages; reword the reference prose to use `--archiver.blobSinkUrl` as the example Versioned snapshots under `docs/network_versioned_docs/version-v4.2.0/` are intentionally left untouched.
When the open merge-train/spartan PR has been open >24h, post a one-line alert to #team-alpha. The cron fires once per day, so the channel sees at most one notification per stuck day. Silent on healthy days. ## Files - `ci3/merge_train_stale_check` — bash script: queries the open PR for a merge-train branch, computes age from `created_at`, and posts a `:warning:` Slack message via `ci3/slack_notify` if age >= `$STALE_HOURS` (default 24). - `.github-new/workflows/merge-train-stale-check.yml` — daily schedule (`7 9 * * *`, 09:07 UTC) + `workflow_dispatch`. Calls the script for `merge-train/spartan` → `#team-alpha`. ## ⚠ Move workflow into `.github/workflows/` before merging The workflow is under `.github-new/` because this session was not started with the `ci-allow` prefix (the prefix needs to be the first token of the prompt; mid-message `ci-allow` was not picked up by the session parser, so `.github/` was still blocked). Before merging, move the file: git mv .github-new/workflows/merge-train-stale-check.yml .github/workflows/merge-train-stale-check.yml Scheduled workflows only execute from the default branch, so the notifier only starts firing once it has landed on `next`. ## Behaviour | State of the open `merge-train/spartan` PR | Action | |---|---| | No open PR (just merged, awaiting auto-recreate) | Silent — no Slack post. | | Open < 24 h (`STALE_HOURS`) | Silent — within expected merge window. | | Open ≥ 24 h | One `:warning:` line to `#team-alpha` with PR link + `mergeable_state`. | ## Reuse Other teams can wire in their own merge-train by adding a job that calls `./ci3/merge_train_stale_check <branch> <channel>`. Threshold and base branch are overridable via `STALE_HOURS` / `BASE_BRANCH` env vars. ## Motivation Driven by a Slack request: merge-train/spartan PR #22980 has been stuck on conflicts for ~6 days with no automated notification. ClaudeBox log: https://claudebox.work/s/e4b1d8ae8d5c867b?run=2 --------- Co-authored-by: Santiago Palladino <santiago@aztec-labs.com>
- Preserve existing JSON file mode when stamping Aztec versions into Noir contract artifacts. - Prevent release images from containing root-only-readable account artifacts used by Spartan deploy jobs.
…story (#23160) ## Summary Test logs print a **History** link like `…/list/history_<hash>_<TARGET_BRANCH>`. For `TARGET_BRANCH=merge-train/spartan` the URL contains a `/`, and Flask's default `/list/<key>` converter only matches a single path segment, so the link 404s. Percent-encoding (`%2F`) doesn't help: WSGI (gunicorn) URL-decodes `PATH_INFO` per PEP 3333, so by the time Werkzeug routes the request, the `%2F` is already a `/`. Fix: change the route to `/list/<path:key>`, which matches slashes. This makes the existing history links work — and recovers all data already written to Redis under keys like `history_<hash>_merge-train/spartan` (history tracking for `merge-train/*` is already enabled by the existing condition in `ci3/run_test_cmd`). This commit reverts the earlier producer-side sanitization in `ci3/run_test_cmd` / `ci3/exec_test`. Doing it in the producer would leave existing entries orphaned under the old slash keys; the dashboard-side fix avoids the split. Note: dashboard changes ship via `ci3/dashboard/deploy.sh` (manual rsync + `systemctl restart rkapp`). Reproducer: http://ci.aztec-labs.com/54e749c45512a629 → click **History**. Background: https://gist.github.com/AztecBot/33fcdd84eba7b273d3f67dfd2ad6be8f ## Test plan - [ ] After `ci3/dashboard/deploy.sh`, the History link on a test run on a `merge-train/*` PR resolves to the existing list. - [ ] Existing `/list/<key>` URLs without slashes (e.g. `…_next`, `…_v4`) continue to work.
…a-ci (#23219) Santiago caught that channel routing alone wasn't enough — today no flake notifications fire for `merge-train/spartan` at all. The `merge-train/spartan` PR's full test suite runs on `pull_request` events (label `ci-full-no-test-cache`), where `REF_NAME=merge-train/spartan` and `is_merge_queue=0`, so the existing `slack_notify_flake=1` trigger never matches. Only the rare `merge_group` runs would have qualified. Two changes in `ci3/run_test_cmd`: 1. Extend the trigger so `slack_notify_flake=1` also fires when `REF_NAME == merge-train/spartan` (mirroring the existing `backport-to-v2-staging` case). 2. Add a `flake_slack_channel` resolver that maps the PR head branch to a Slack channel: `merge-train/spartan` → `#team-alpha-ci` (`C0B3EFDPT7B`); everything else falls through to `slack_notify`'s default `#aztec3-ci`. The resolver uses `REF_NAME` directly for `pull_request` runs and falls back to `gh pr view <num>` (parsed from `gh-readonly-queue/<base>/pr-<num>-<sha>`) for `merge_group` runs. Result is cached in `/tmp` so parallel tests on the same EC2 instance share a single resolution. Design notes and rationale: https://gist.github.com/AztecBot/2d706371f8dcb7386880859d69a90435
…23213) ## Summary Fixes the merge-queue failure in `e2e_blacklist_token_contract/shielding` ([CI run](http://ci.aztec-labs.com/d5485e6652b3f32a)) where every test fails in `applyMint` with `Invalid tx: Invalid expiration timestamp`. ## Root cause `warpL2TimeAtLeastTo` (introduced in #22084) calls `eth.warp` followed by `node.mineBlock()`. The sequencer's polling loop captures `nowSeconds`/`slot` at the top of each `work()` cycle. An in-flight cycle that started just before the warp will mine an L2 block at the *pre-warp* slot — L1 sync prunes that block from the canonical chain, but it lingers in local world state and the PXE anchors subsequent txs against it. With `MAX_TX_LIFETIME == CHANGE_ROLES_DELAY == 86400s`, the resulting `expiration_timestamp` lands exactly on the post-warp slot boundary and the validator rejects the tx as soon as the wall-clock crosses to the next slot. ## Fix After `eth.warp`, retry `mineBlock` until the latest L2 block's slot is at or past the slot corresponding to the warped timestamp. The first `mineBlock` may return a stale block produced by an in-flight cycle; the next triggers a fresh sequencer cycle that reads the post-warp time and builds a block at the post-warp slot. Subsequent txs then anchor against a fresh block whose `expiration_timestamp` is well in the future. The signature of `warpL2TimeAtLeastTo`/`warpL2TimeAtLeastBy` widens from `AztecNodeDebug` to `AztecNode & AztecNodeDebug` so we can read the latest block via `getBlockData('latest')`. All current callers already type their node as the intersection. This re-applies the diagnosis from the prior #22796 (which never merged), adapted to the current `getBlockData('latest')` API. Full analysis: https://gist.github.com/AztecBot/67815cbe3c3f853d97ec3345dfb0c985 ## Test plan - `e2e_blacklist_token_contract/shielding` (originally failing) - `e2e_blacklist_token_contract/{access_control,burn,minting,transfer_*,unshielding}` — share `applyBaseSetup` → `crossTimestampOfChange` - `e2e_contract_updates` - `composed/e2e_cheat_codes` (verifies the type-widening change still resolves the methods correctly) ClaudeBox log: https://claudebox.work/s/28594b4dc64f1cd0?run=1
…pair (#22996) Introduces a sub-tree + top-tree orchestrator pair that decomposes the existing single-class proving orchestrator along the natural state-coupling boundary — per-checkpoint block-level work vs. epoch-level top-tree work — while leaving every existing API on the legacy `EpochProver` / `ProvingOrchestrator` / `EpochProvingState` path untouched. The prover-node and e2e tests build unchanged; this PR is purely additive in surface area, with structural refactors on `ProvingOrchestrator` to share scheduling and top-tree drivers with the new `TopTreeOrchestrator`. ## What's new - **`CheckpointSubTreeOrchestrator`** (`checkpoint-sub-tree-orchestrator.ts`): extends `ProvingOrchestrator`, single-checkpoint by construction. Drives chonk-verifier / base / merge / block-root / block-merge for one checkpoint and resolves a `SubTreeResult` instead of escalating to the checkpoint root — the parent's `checkAndEnqueueCheckpointRootRollup` is overridden to short-circuit. The constructor calls `super.startNewEpoch(epoch, 1, empty challenges)` to set up a single-checkpoint mini-epoch; the count and challenges are never read because the override prevents the parent's finalize / root path from running. - **`TopTreeOrchestrator`** + **`TopTreeProvingState`**: self-contained driver from checkpoint-root through epoch-root rollup. Takes per-checkpoint block-proof promises and pipelines its hint chain against them. Cancellation surfaces as `TopTreeCancelledError` so callers can distinguish reorg-driven cancel from a genuine proving failure. - **`EpochProvingContext`** (`epoch-proving-context.ts`): per-epoch shared cache for chonk-verifier proofs. Survives sub-tree cancellation so a tx that gets reorged out and re-appears in a replacement checkpoint reuses the cached proof. - **`ProvingScheduler`** (`proving-scheduler.ts`): abstract base owning the `SerialQueue` deferred-job lifecycle, the `pendingProvingJobs` controller list, and a unified `deferredProving<S, T>(state, request, callback, isCancelled?)` submit envelope. The minimal `ProvingStateLike` contract is just `verifyState()` + `reject(reason)`. - **`TopTreeProvingScheduler`** (`top-tree-proving-scheduler.ts`): extends `ProvingScheduler` and holds the checkpoint-merge, padding, and root-rollup drivers (plus tree-walking helpers) shared by both orchestrators. Wraps circuit calls via a `wrapCircuitCall` hook (orchestrator overrides for spans; top-tree leaves identity) and resolves via an `onRootRollupComplete` hook to bridge the two states' differing `resolve` signatures. The per-checkpoint root driver stays subclass-specific because input-building flows differ. - **`EpochProverFactory` interface on `ProverClient`**: new factory methods `createEpochProvingContext(epochNumber)`, `createCheckpointSubTreeOrchestrator(...)`, and `createTopTreeOrchestrator()`. A single shared `BrokerCircuitProverFacade` is owned by `ProverClient` and shared across every orchestrator. ## What changes in existing code - `ProvingOrchestrator` extends `TopTreeProvingScheduler`; the inline broker-job submit envelope, queue lifecycle, and the top-tree-section drivers are inherited. `cancel()` delegates the queue-recreate + abort-jobs logic to `resetSchedulerState(this.cancelJobsOnStop)`. Three internal methods (`getOrEnqueueChonkVerifier`, `checkAndEnqueueBaseRollup`, `checkAndEnqueueCheckpointRootRollup`) become `protected` so the sub-tree can override them; `provingState` and `provingPromise` likewise become `protected` so the sub-tree can hook the parent's failure stream onto `subTreeResult`. No public API change on `ProvingOrchestrator`. - `CheckpointProvingState`: gains two read-only accessors used by the sub-tree's checkpoint-root override — `getSubTreeOutputProofs()` and `getLastArchiveSiblingPath()`. No state changes. - `ProverClient` keeps `createEpochProver()` exactly as before (each call spawns its own `BrokerCircuitProverFacade`); the new factory methods share a `getFacade()` set up in `start()` and torn down in `stop()`. `EpochProver`, `EpochProverManager`, `ServerEpochProver`, `EpochProvingState`, the integration tests in `orchestrator_*.test.ts`, `bb_prover_full_rollup.test.ts`, and `stdlib/interfaces/*` are all unchanged from `merge-train/spartan` — the prover-node and e2e tests continue to build against the existing `EpochProver` API. Migrating the prover-node onto the new factories (and the deferred-finalize flow that goes with optimistic proving) is the follow-up PR. ## Test plan - 261 prover-client tests pass (full `yarn workspace @aztec/prover-client test`). - `yarn build` clean against current merge-train/spartan (modulo the pre-existing `@aztec/sqlite3mc-wasm` issue inherited from baseline).
…#23244) ## Why `bb::srs::http_download` made a single HTTP request and threw on any error — including transient ones like `Could not establish connection`. With 10 parallel grind shards in `merge-queue-heavy` and `parallel --halt now,fail=1` in `ci.sh`, a single CDN blip on one shard kills the whole merge-train run. That's what failed the most recent merge-train/spartan MQ attempt: `CrsFactory.Bn254CompressedChunkHashFirstChunk` on shard `x5-full` of run [25795648831](https://github.com/AztecProtocol/aztec-packages/actions/runs/25795648831), with `HTTP request failed for http://crs.aztec-cdn.foundation/g1_compressed.dat: Could not establish connection`. No code regression — pure network flake. ## What Add a bounded retry loop inside `http_download`: - Up to 3 attempts total - Retry on connection-class errors (`!res`) and on transient HTTP status (5xx, 429); don't retry on other 4xx - After the first failure, tighten the per-attempt connect/read timeouts to 5s (down from 30s/60s) so retries don't burn the original timeout budget twice - Exponential backoff between retries (1s, 2s) - `vinfo` per retry; throw with attempt count on terminal failure ## Latency budget Retry-induced extra latency (beyond the first attempt) is bounded: ``` backoff 1s + retry 5s + backoff 2s + retry 5s = 13s (< 15s) ``` Well within the 600s test timeout, and small enough that a fully-down CDN fails fast rather than dragging out the grind shard. WASM path is untouched — it still throws immediately, same as before.
- Removes use of old reqresp method `sendBatchRequest`. - Lifts code from `proposal_tx_collector.ts` to FastTxCollection. Testing - Tests for TxCollection were using the old mechanism so they had to be migrated. - I had a good fight with tests for TxCollection because I wanted to keep things clean without going too much into the p2p network, but still test something. I ended up making some internals of FastTxCollection protected and using them in the test. This is ugly, but was already partially being done. Hopefully we can improve on it with a bigger refactor.
## Motivation The top-level `sequencer-client/README.md` was years out of date — it still referred to single-block-per-slot building and made no mention of proposer pipelining or the multi-block checkpoint model. The timing-model README still documented both pipelined and non-pipelined scheduling even though the non-pipelined mode is about to be removed. New contributors (human or AI) lacked the context they need to make changes to block building. ## Approach Rewrote the top-level README from scratch following the package's `readme-writer` guidelines: slots / blocks / checkpoints, proposed vs checkpointed chain, an architecture diagram, the `Sequencer` work loop, `CheckpointProposalJob` lifecycle, per-block loop pseudocode, the `SequencerPublisher` Multicall3 bundling and `sendRequestsAt` semantics, events, configuration reference, and failure modes. Trimmed `src/sequencer/README.md` to cover only the pipelined timing model with formulas grounded in `PipelinedCheckpointTimingModel` and a corrected 72 s / 8 s walkthrough. Ran `/codex` for a critical review and fixed all flagged issues (last-sub-slot-is-not-cooldown, event-emit timing, config env-var names, attestation-deadline nuance, `insufficient-valid-txs` handling, publisher `preCheck` semantics). ## Changes - **sequencer-client**: Replaced `README.md` with an architecture-first rewrite covering pipelining (build slot vs target slot, depth bound of 2, parent-invalidation discard), the per-slot job lifecycle, the publisher's Multicall3 flow, and the full config reference. - **sequencer-client (sequencer)**: Replaced `src/sequencer/README.md` with a pipelining-only timing model. Documents `timeReservedAtEnd`, `maxNumberOfBlocks`, per-state deadlines, proposer-vs-committee parallel timeline, and timing-variation handling.
…imulatePublicCalls (#23163) ## Motivation `simulatePublicCalls` forks the world state at the latest synced block but never inserted the L1 to L2 messages that would be added at the start of the next checkpoint, if the next block falls in a new checkpoint. ## Approach If the last proposed block matches the last block in the last proposed checkpoint (read it carefully, I promise it makes sense), then the last proposed block is the last block in its checkpoint, so the next block will land on a new checkpoint, so we add the L1 to L2 messages to the world-state fork before simulating.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BEGIN_COMMIT_OVERRIDE
fix(test): warp L1 forward when proposer scan hits EpochNotStable (#22967)
test(e2e): fail epochs tests on proposer-rollup-check-failed (#22965)
fix: grafana switch to aztec_status="proposed" (#22978)
chore: update benchmark scraper (#22984)
test(e2e): migrate simple epoch tests to pipelining (#22973)
chore: remove top-level yarn.lock (#22987)
refactor(archiver)!: unify L2BlockSource checkpoint lookups via query objects (#22933)
fix(sequencer): bounded sweep instead of event scan for governance proposal check (#22989)
fix(docs): allow webapp-tutorial yarn install to populate empty lockfile in CI (#23000)
test(e2e): enable pipelining in l1-reorgs and mbps redistribution tests (#23009)
fix(archiver): restore pending block height metric under pipelining (#22994)
chore(p2p): remove skipped validation result option (#23034)
refactor(p2p)!: remove slow tx collection flow (#22878)
chore(spartan): add next-net-clone environment config (#22995)
chore(sequencer): add context to proposer-rollup-check-failed logs (#23071)
test(e2e): wait for archiver sync before asserting pipelining (#22997)
refactor(node-rpc)!: remove deprecated AztecNode methods and L2BlockSource tip helpers (#22934)
feat(p2p): detect and track announce IP changes at runtime (#22405)
test: mark tx_stats_bench 10 TPS as flake-retryable on merge-train/spartan (#23083)
fix(sequencer): bind vote-only multicalls to target slot under pipelining (#23090)
feat(sequencer): build optimistically across pruning epoch boundary (#23056)
fix(sequencer): use chainTipsOverride.pending for log context (#23098)
test(e2e): relax post-boundary slot assertion in epochs_proof_at_boundary (#23108)
fix(bb-prover): pool long-lived bb verifier processes instead of spawning per-call (#23093)
fix(sequencer): anchor fee asset price modifier to predicted parent (#23113)
chore: error log when L1 head timestamp drifts (#22947)
fix(sequencer): override full parent checkpoint cell in pipelined simulation (#23073)
test(e2e): enable pipelining on missed l1 slot test (#23068)
fix: more robust metrics reporting in IRM monitor (#23038)
fix: preserve LMDB slashing protection (#23145)
test(e2e): enable pipelining on p2p tests (#23070)
fix(archiver): move L2 tips cache refresh out of write transactions (#23110)
test(e2e): fix data_withholding_slash flake by freezing L1 across restart (#23162)
fix(validator): include proposed checkpoint out-hashes when validating checkpoint proposals (#23119)
refactor(config): drop nested config option, flatten l1Contracts (#23143)
test(e2e): bump bash TIMEOUT for e2e_p2p/add_rollup to match jest 20m (#23177)
fix(p2p): chunk archive of mined txs on block finalization (A-969) (#23085)
fix(p2p): stream tx pool hydration to bound startup memory (A-968) (#23086)
chore: remove orphan --archiver flag usages from start invocations (#23186)
feat(ci): daily merge-train/spartan stale-PR notifier (#23189)
fix: preserve contract artifact permissions (#23174)
fix(ci3): accept slashes in /list/<path:key> for merge-train history (#23160)
feat(ci): route merge-train/spartan flake notifications to #team-alpha-ci (#23219)
fix(cheat-codes): wait for post-warp L2 block in warpL2TimeAtLeastTo (#23213)
feat: slash attesters signing over bad checkpoints (#23180)
refactor(prover-client): split orchestrator into sub-tree + top-tree pair (#22996)
fix(srs): retry transient CRS HTTP downloads with exponential backoff (#23244)
refactor(p2p): remove old reqresp mode (#23158)
docs(sequencer-client): rewrite top-level and timing READMEs (#23149)
fix(aztec-node): include upcoming checkpoint's L1 to L2 messages in simulatePublicCalls (#23163)
END_COMMIT_OVERRIDE