refactor(sequencer): split timing into consensus and proposer timetables#23809
Conversation
0c39bab to
c56410f
Compare
7fa8297 to
066fe9c
Compare
PhilWindle
left a comment
There was a problem hiding this comment.
Generally agree with the changes. They weren't as extensive as I had feared. The only concern I think I have is around the attestation_deadline = last_ethereum_block_in_target_slot - ethereum_slot_duration.
Does this make it too easy for validators to grief proposers by withholding attestations until deep into the target slot. Potentially increasing proposer costs/or missed checkpoints.
| * proposer pipelining. Also the slot of the parent checkpoint this job builds on top of. | ||
| */ | ||
| private getBuildSlot(): SlotNumber { | ||
| return SlotNumber(this.targetSlot - 1); |
There was a problem hiding this comment.
Shall we use PROPOSER_PIPELINING_SLOT_OFFSET?
| slot: this.targetSlot, | ||
| }); | ||
| await this.waitUntilTimeInSlot(nextSubslotStart); | ||
| await this.waitUntilTimeInSlot(nextSubslotStart - this.getSlotStartBuildTimestamp()); |
There was a problem hiding this comment.
Extremely minor. waitUntilTimeInSlot() is only called here and does sleepUntil(arg + this.getSlotStartBuildTimestamp()) :-)
| if (this.blockDuration === undefined) { | ||
| // Single-block enforced mode: execution and re-execution run sequentially, so split the time | ||
| // remaining until the attestation deadline in half. | ||
| const maxAllowed = this.getAttestationDeadline(slot); |
There was a problem hiding this comment.
Perhaps it's not an issue if only used for tests etc. But I think this means single block only mode won't take p2p propagation into consideration?
There was a problem hiding this comment.
Single-block mode is hopefully going away in #23821, to be replaced by the automine sequencer that doesn't rely on a timetable, so I didn't bother too much with it.
…ublish windows Under proposer pipelining the checkpoint proposal job passed `targetSlot` to `setState` for build-frame states, which made `Sequencer.setState` measure the state deadlines a full Aztec slot too late (frame B), so they never fired. Measure build-frame deadlines against `slotNow` (the build slot). Also align the timing windows around L1 geometry: - Base the checkpoint attestation/publish deadlines and the p2p attestation acceptance window on `ethereumSlotDuration` (12s before the last L1 block of the target slot) rather than the configurable `l1PublishingTime`. This unifies the deadline with the publisher's send lead, which already targets one Ethereum slot before the target slot start. - Validators keep validating/attesting checkpoint proposals until the L1 publish deadline instead of the target-slot start. Adds regression coverage for the frame bug (build-frame states must be measured against the build slot) and for the realigned attestation/publish windows.
…rTimetable Replace the state-gated CheckpointTimingModel with two explicit, slot-keyed timetables: ConsensusTimetable (protocol-constant deadlines + p2p acceptance bounds) and ProposerTimetable (proposer ideal schedule + block sub-slots). Collapse the publish/attestation/validation deadlines into a single consensus-driven attestation_deadline and drop l1_publish_prepare_time. Make setState pure and query deadlines explicitly (remove getMaxAllowedTime/assertTimeLeft/SequencerTooSlowError). Enforce explicit p2p receive windows. Update the spec README and example timeline SVG.
…te under pipelining Three CI fixes for the consensus/proposer timetable split: - p2p libp2p_service test: the mocked L1 constants omitted l1GenesisTime and ethereumSlotDuration, which the new PipeliningWindow/ConsensusTimetable path needs to derive slot timestamps. Add them and pin getEpochAndSlotNow so test proposals fall inside slot 100's receive window. - checkpoint_proposal_job: the propose tx timeout was set to attestation_deadline (target_slot_start + S - 2E), which bounds when validators must sign, not when the proposer must send. Attestations are collected up to (and, when not enforcing, past) that point, so the propose tx was enqueued already expired and timed out before mining. Use last_ethereum_block_in_target_slot (target_slot_start + S - E), the latest L1 block the propose can still land in. - sequencer: the sync-failure vote fallback gated on now > getStartDeadline(targetSlot), but targetSlot is derived from the next L1 slot (a one-ethereumSlotDuration look-ahead), so now never reaches that deadline for the current target slot when E > 2*minBlockDuration, and governance/slashing votes never fired. Apply the same look-ahead to the comparison. - epochs_missed_l1_slot e2e: build-frame state events now carry the build slot (slotNow), not the target slot, so wait for INITIALIZING_CHECKPOINT at the build slot.
The timetable refactor anchored block sub-slots directly at build_frame_start (block_build_deadline(k) = build_frame_start + (k+1)*D), assuming the proposer begins building at the moment the build frame opens. In practice the proposer only enters the build loop after sync, the proposer check, and checkpoint initialization, so it starts some time into the frame. When that prologue plus min_block_duration exceeds D (notably when min_block_duration is clamped to D in tight fast profiles), the first sub-slot's deadline is already too close to be startable, so it is skipped. This under-packs checkpoints (fewer than minBlocksForCheckpoint) and starves the first block of a checkpoint of build time, breaking enforced-timetable multi-block-per-checkpoint e2e tests. Reintroduce a checkpoint_proposal_init_time budget (default 1s, matching the pre-refactor checkpointInitializationTime) and anchor the sub-slot grid at build_frame_start + init. max_blocks_per_checkpoint subtracts the same init, so the derived counts for the production and local profiles are unchanged. This is a proposer operational budget only; no consensus-acceptance deadline changes. Update the timing spec and timetable tests accordingly, and add a regression test for the tight minD == D fast profile with a realistic late proposer start.
…t durations
The timetable refactor dropped the pre-refactor fast-profile override that
forced shorter operational budgets when ethereum_slot_duration < 8. Without it,
fast local/e2e networks inherit the conservative production budgets
(p2p_propagation_time=2s, checkpoint_proposal_prepare_time=1s, min_block_duration=2s)
because the e2e configs only set slot durations and block duration, not the
budgets. Those large budgets shrink the per-checkpoint build window:
max_blocks_per_checkpoint = floor((S - init - D - 2P - prepCp)/D) drops below
the count the network needs, under-packing checkpoints under enforceTimeTable.
Concretely, with production budgets: epochs_l1_reorgs (S=36,E=4,D=8) and
l2_to_l1 (S=12,E=4,D=2) derive only 2 blocks per checkpoint instead of the
pre-refactor 3 and 4, so the multi-block checkpoints either fail to pack or
land too late to be observed before the test reads the archiver.
Reintroduce the fast profile in the stdlib timetable: when
ethereum_slot_duration < FAST_PROFILE_ETHEREUM_SLOT_DURATION (8s), clamp
p2p_propagation_time, checkpoint_proposal_prepare_time, and min_block_duration
down to the documented fast values (0.5/0.5/1). The clamp only lowers budgets,
so an explicitly smaller config is kept, and production (E>=8, e.g. high_tps at
E=12) is unaffected. checkpoint_proposal_init_time stays at its prologue value
so the first-sub-slot fix from the previous commit is preserved. Consensus
acceptance deadlines (ConsensusTimetable, p2p receive windows) are derived only
from {S,E,D} and are untouched.
calculateMaxBlocksPerSlot resolves budgets through the same profile so p2p
gossipsub scoring derives the same count as the proposer. Add unit assertions
pinning the derived capacity for each enforced multi-block e2e config to its
minBlocksForCheckpoint and expected build-window count.
Build the consensus/proposer timetables once per component and inject the object into consumers instead of passing option bundles and constructing timetables at scattered sites. Add constant getters (genesisTime, getL1Constants, resolved budgets) so consumers read config off the timetable. Delete PipeliningWindow; validators now check the disparity-widened receive window directly via shared helpers in clock_tolerance. Block and checkpoint proposals share the tight checkpoint proposal receive window as their arrival gate, and the current/next-slot short-circuits in the attestation and proposal validators are removed. Update the sequencer README to document the shared block-proposal arrival gate. Fold calculateMaxBlocksPerSlot into ProposerTimetable.getMaxBlocksPerCheckpoint and have p2p gossipsub scoring construct a ProposerTimetable so block-count budget resolution goes through the same path as the sequencer.
…tables ConsensusTimetable and ProposerTimetable now require every protocol constant and operational budget at the constructor. blockDuration stays number|undefined for legacy single-block mode but is a required property (no ?), so every call site passes it explicitly. Defaults live in the config layer (sequencer config mappings, the SequencerTimetable adapter, and the p2p scoring adapter), not in the timetable or at call sites. Threads the real ethereumSlotDuration into calculateBlocksPerSlot and makes it required, removing the slotDurationMs/1000 fallback that forced the production budget profile. Wires blockDurationMs into the validator-client config surface (via the shared sequencer config mapping) and passes the explicit blockDuration into its ConsensusTimetable. This tightens the block/checkpoint proposal receive deadline the validator-client validates against from target_slot_start - E to - E - D when blockDuration is set, which is the intended spec behavior; the attestation deadline the validator-client itself consumes is unaffected (no D term).
retryUntil now accepts its timeout as a number of seconds, an explicit
{ timeout } in seconds, or an absolute { deadline } with an optional
DateProvider. The deadline form derives the remaining budget from the
provider's clock at call time and treats an already-past deadline as an
immediate timeout, matching the existing non-positive-timeout semantics.
Use the deadline form in the validator ProposalHandler re-execution and
parent-block paths instead of manually converting an absolute deadline to
fractional seconds.
Split timetable/index.ts into one file per concern: budgets.ts (the DEFAULT_*/FAST_PROFILE_* constants and resolveTimingBudgets), consensus_timetable.ts (ConsensusTimetable + SlotTimingConstants), and proposer_timetable.ts (ProposerTimetable + SubslotSelection). index.ts is now re-exports only. Split the test file to match. Rename ProposerTimetable.getStartDeadline to getBuildStartDeadline. Retitle the moved spec README to 'Block Building Timetable Spec' and repoint the sequencer-client README references to its new stdlib location.
…deadline plumbing - Drop the SequencerTimetable adapter; the Sequencer now constructs the ProposerTimetable from its config + l1Constants (filling unset budgets with the shared DEFAULT_* values) and keeps the max-blocks-per-checkpoint validation. - CheckpointProposalJob reports the target slot (not the build slot) on setState via a private setState helper, and anchors build-frame timing on ProposerTimetable.getBuildFrameStart(targetSlot). - Compute the setState seconds-into-slot metric and stateDurationMs from the date provider (getBuildFrameStart + the delta between consecutive setState calls) so they track simulated time under a mocked clock. - Inline the sync-failure vote gate as a plain now <= start_deadline check (drop the next-L1-slot look-ahead) and rename getStartDeadline usages to getBuildStartDeadline. - Mark the slot as attempted on the build-start-deadline abort path so a deadline abort does not rebuild the same checkpoint within the slot. - Replace the deprecated MIN_EXECUTION_TIME with DEFAULT_MIN_BLOCK_DURATION (and config minBlockDuration in the sequencer) and remove the vestigial l1PublishingTime config field, mapping, env var, schema, and p2p plumbing.
… 0 in timetable Codex review fixes: PROPOSER_CHECK state now reports the target slot like the rest of the proposal lifecycle; getBuildFrameStart no longer throws on slot 0 (p2p validators evaluate windows for peer-supplied slots); refresh stale sequencer-client README sections describing the deleted timetable wrapper, state-gated deadlines, and l1PublishingTime.
…ssed-l1-slot test
…rop slotNow - Rename the state-changed payload's `secondsIntoSlot` to `secondsIntoBuildFrame` (seconds since the build-frame start of the target slot, which may be negative) and `slot` to `targetSlot`, with JSDoc documenting the build-frame anchoring and the undefined-for-lifecycle-states semantics. Rename the sequencer helper `getSecondsIntoSlot` to `getSecondsIntoBuildFrame`. - Drop the redundant `slotNow` constructor arg/field from CheckpointProposalJob: it was always `targetSlot - 1` (the wall-clock build slot under pipelining). All uses now derive it from a private `getBuildSlot()` helper. - Verify the private `setState(state)` helper is the sole caller of `setStateFn`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…validators - Add a configurable maximum gossip clock-disparity tolerance (`maxGossipClockDisparityMs`, env `P2P_MAX_GOSSIP_CLOCK_DISPARITY_MS`, default 500ms) instead of the hardcoded MAXIMUM_GOSSIP_CLOCK_DISPARITY_MS constant. The value is threaded into the proposal and attestation validators, which now inline the receive-window check directly from the injected ConsensusTimetable getters. Deletes clock_tolerance.ts and its helpers; the window-boundary cases are rehomed to the validator tests. - Remove `calculateBlocksPerSlot`: libp2p_service now builds a single ProposerTimetable (from p2p config budgets) shared by the validators and the gossipsub TopicScoreParamsFactory, which calls getMaxBlocksPerCheckpoint() directly. Drops the DEFAULT_*-filling adapter from topic_score_params. - Simplify proposal_handler's block-sync wait back to a single retryUntil call using the deadline overload (a past deadline now times out natively), dropping the manual remaining<=0 fail-fast guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remove hard line breaks within prose paragraphs and list items in the sequencer-client and stdlib timetable documentation. Single logical lines were split at ~120 chars for formatting; with word wrap enabled, these should be single physical lines. - sequencer-client/README.md: 18 joins across all changed sections - stdlib/src/timetable/README.md: 36 joins in the new timetable spec document Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…land deadline The orphan-prune gate fired at now >= expectedCheckpointedByTime, pruning a block-only tip at the exact instant its enclosing checkpoint could still legitimately land. The checkpoint receive deadline is an inclusive acceptance bound, so a checkpoint arriving at that instant is still valid; pruning then is one tick too aggressive. This regressed once the deadline anchor moved to the checkpoint receive deadline (target_slot_start - E - D) and was rounded to an integer tick, landing it exactly on the now tick under second-granularity timing. Require strictly-past (now > expected) for both the archiver prune and the sequencer overdue gate so the two agree.
…n deadline The wall-clock orphan prune in pruneOrphanProposedBlocks tore down a proposed block-only tip at the checkpoint receive deadline plus grace (expected land time). That instant is earlier than the attestation deadline, which is both the latest a checkpoint can still land on L1 (promoting these blocks into a confirmed checkpoint during L1 sync) and the cutoff by which validators must finish re-executing the proposals. Pruning at the earlier time discarded state that was still within its valid window, breaking in-flight re-execution. Gate the destructive prune on getAttestationDeadline instead. The sequencer's non-destructive overdue warning stays at the expected land time as an earlier soft signal; its comments are updated to reflect that the two boundaries now differ by design.
I initially had a But if we assume that a majority of the validators is honest, then it makes sense to account as much as we can for delays in attestations reaching the proposer over the p2p network, so we have as much chances as we can to collect them. And the last possible moment where we could collect them is right before having to send the L1 tx, which is before the last L1 block of the L2 slot. Happy to discuss more though. |
…LINING_SLOT_OFFSET Address review feedback: derive the build slot from PROPOSER_PIPELINING_SLOT_OFFSET instead of a magic -1, and fold the single-caller waitUntilTimeInSlot helper into waitUntilNextSubslot (which already holds the absolute sub-slot deadline). Tests now override/spy waitUntilNextSubslot directly.
e02ad56 to
1875c33
Compare
1875c33 to
571ffb0
Compare
571ffb0 to
0eb4aa9
Compare
Flakey Tests🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry. |
Initial words from the human behind this changeset
So it's not all AI!
This PR splits the timetable into two. One of them is related to consensus, and defines when proposals and attestations are to be accepted. The other is related to block building, only used by the proposer.
Aside from the split, we also redefine some of the times, such that:
Everything below this line is now AI generated. It's not that bad to read though.
What & why
Restructures the sequencer timing model ("timetable") from a single state-gated model into two explicit, slot-keyed timetables, so the proposer's scheduling and the consensus deadlines validators and the next proposer must agree on are cleanly separated and queried directly — instead of being inferred from sequencer state transitions.
This PR focuses on the new structure; the change count is large mostly because the old
CheckpointTimingModeland its state-gated plumbing are replaced wholesale.The split
ConsensusTimetable— consensus acceptance bounds only, derived from protocol constants{genesis_time, aztec_slot_duration, ethereum_slot_duration, block_duration}. Returns the receive-window bounds and the single attestation deadline. Used by validator-client, p2p, the publisher's L1-tx timeout, and next-proposer handoff reasoning.ProposerTimetable— composes the consensus one and adds the proposer's ideal/happy-path schedule and the block sub-slot timetable, with the operational budgets{min_block_duration, p2p_propagation_time, checkpoint_proposal_prepare_time}. Used only by the sequencer / checkpoint-proposal job.Key design points
attestation_deadline(target_slot_start + aztec_slot_duration − 2 · ethereum_slot_duration) replaces the old publish / attestation-receive / proposal-validation deadlines and removesl1_publish_prepare_time. It is the one hard, consensus-driven (used for inactivity/slashing) cutoff for re-execution, validation, signing, the proposer's attestation-collection cutoff, the p2p attestation acceptance bound, and the L1-tx timeout.checkpoint_proposal_receive_deadline = target_slot_start − ethereum_slot_duration − block_durationdepends only on protocol constants and gates the next-proposer handoff. Enforced at p2p ingress, checkpoint-type only, against arrival time.setStateis now pure —getMaxAllowedTime,assertTimeLeft, andSequencerTooSlowErrorare gone. The proposer queriesgetStartDeadline/selectNextSubslotdirectly; states remain purely for lifecycle/observability.[receive_start − clock_disparity, deadline + clock_disparity]ranges: checkpoint proposals get a tight, non-overlapping window (a proposal maps unambiguously to one target slot); attestations get a deliberately liberal window (they are self-tagged by(slot, checkpoint)in the signature).last_block_build_timeandcheckpoint_proposal_send_timeare single values, no ideal/deadline pairs.Example (production values)
With
aztec_slot_duration=72,ethereum_slot_duration=12,block_duration=6,min_block_duration=2,p2p_propagation_time=2,checkpoint_proposal_prepare_time=1:max_blocks_per_checkpoint = 10;last_block_build_timeat +61s;attestation_deadlineat +132s. The full model, worked numbers, and a proportional timeline diagram are insequencer-client/src/sequencer/README.md.Notes
enforce=falseand single-block mode are retained for now — they remain the e2e default and the sandbox/local-network default (on the productionSequencer, not automine). A follow-up will remove them by (1) adapting tests so timetable enforcement can be on everywhere and (2) migrating local-network to the automine sequencer (which bypasses the timetable entirely).attestation_deadlinerather than the next wall-clock slot boundary.