Skip to content

feat: merge-train/spartan-v5#23910

Merged
spalladino merged 6 commits into
v5-nextfrom
merge-train/spartan-v5
Jun 6, 2026
Merged

feat: merge-train/spartan-v5#23910
spalladino merged 6 commits into
v5-nextfrom
merge-train/spartan-v5

Conversation

@AztecBot

@AztecBot AztecBot commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

BEGIN_COMMIT_OVERRIDE
refactor(sequencer): split timing into consensus and proposer timetables (#23809)
chore(sequencer): gate checkSync behind the cheap proposer check (#23810)
test(cli-wallet): wait for checkpointed in serial sandbox flow tests (#23916)
END_COMMIT_OVERRIDE

…les (#23809)

## Initial words from the human behind this changeset

So it's not all AI! 

This PR splits the timetable into two. One of them is related to
consensus, and defines when proposals and attestations are to be
accepted. The other is related to block building, only used by the
proposer.

Aside from the split, we also redefine some of the times, such that:
- Block building starts one Ethereum slot before the start of the build
slot.
- The proposer finishes block building such that, assuming all
validators respond in time, it has its checkpoint ready to submit to L1
exactly one Ethereum slot before the start of the target slot, to
maximize chances for inclusion.
- Should validators _not_ respond in time, the sequencer will still wait
for attestations until one Ethereum slot before the end of its target
slot, ie the last possible moment for submission. The network considers
attestations valid up until this point.
- On the other hand, checkpoint proposals are accepted up until a few
seconds before the block-building start of the next proposer. Those "few
seconds" are one block duration, so the next proposer has time to
reexecute and catch up.

Everything below this line is now AI generated. It's not that bad to
read though.

## What & why

Restructures the sequencer timing model ("timetable") from a single
state-gated model into **two explicit, slot-keyed timetables**, so the
proposer's *scheduling* and the *consensus deadlines* validators and the
next proposer must agree on are cleanly separated and queried directly —
instead of being inferred from sequencer state transitions.

This PR focuses on the new structure; the change count is large mostly
because the old `CheckpointTimingModel` and its state-gated plumbing are
replaced wholesale.

![Timetable
example](https://raw.githubusercontent.com/AztecProtocol/aztec-packages/spl/timetable-consensus-proposer-split/yarn-project/stdlib/src/timetable/timetable-example.svg)

## The split

- **`ConsensusTimetable`** — consensus acceptance bounds only, derived
from protocol constants `{genesis_time, aztec_slot_duration,
ethereum_slot_duration, block_duration}`. Returns the receive-window
bounds and the single attestation deadline. Used by validator-client,
p2p, the publisher's L1-tx timeout, and next-proposer handoff reasoning.
- **`ProposerTimetable`** — composes the consensus one and adds the
proposer's ideal/happy-path schedule and the block sub-slot timetable,
with the operational budgets `{min_block_duration, p2p_propagation_time,
checkpoint_proposal_prepare_time}`. Used only by the sequencer /
checkpoint-proposal job.

## Key design points

- **Single `attestation_deadline`** (`target_slot_start +
aztec_slot_duration − 2 · ethereum_slot_duration`) replaces the old
publish / attestation-receive / proposal-validation deadlines and
removes `l1_publish_prepare_time`. It is the one hard,
**consensus-driven** (used for inactivity/slashing) cutoff for
re-execution, validation, signing, the proposer's attestation-collection
cutoff, the p2p attestation acceptance bound, and the L1-tx timeout.
- **Consensus receive gate** `checkpoint_proposal_receive_deadline =
target_slot_start − ethereum_slot_duration − block_duration` depends
only on protocol constants and gates the next-proposer handoff. Enforced
at **p2p ingress, checkpoint-type only**, against arrival time.
- **Explicit deadline queries replace state-gated timing.** `setState`
is now pure — `getMaxAllowedTime`, `assertTimeLeft`, and
`SequencerTooSlowError` are gone. The proposer queries
`getStartDeadline` / `selectNextSubslot` directly; states remain purely
for lifecycle/observability.
- **p2p acceptance windows** are now explicit `[receive_start −
clock_disparity, deadline + clock_disparity]` ranges: checkpoint
proposals get a tight, **non-overlapping** window (a proposal maps
unambiguously to one target slot); attestations get a deliberately
**liberal** window (they are self-tagged by `(slot, checkpoint)` in the
signature).
- The proposer sizes block production around the **ideal L1-publish path
only** — `last_block_build_time` and `checkpoint_proposal_send_time` are
single values, no ideal/deadline pairs.

## Example (production values)

With `aztec_slot_duration=72`, `ethereum_slot_duration=12`,
`block_duration=6`, `min_block_duration=2`, `p2p_propagation_time=2`,
`checkpoint_proposal_prepare_time=1`:

`max_blocks_per_checkpoint = 10`; `last_block_build_time` at +61s;
`attestation_deadline` at +132s. The full model, worked numbers, and a
proportional timeline diagram are in
`sequencer-client/src/sequencer/README.md`.

## Notes

- `enforce=false` and single-block mode are **retained for now** — they
remain the e2e default and the sandbox/local-network default (on the
production `Sequencer`, not automine). A follow-up will remove them by
(1) adapting tests so timetable enforcement can be on everywhere and (2)
migrating local-network to the automine sequencer (which bypasses the
timetable entirely).
- **Behavior change**: intermediate block-proposal re-execution is now
bounded by the single `attestation_deadline` rather than the next
wall-clock slot boundary.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

@ludamad ludamad left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Auto-approved

@AztecBot AztecBot added this pull request to the merge queue Jun 5, 2026
@AztecBot

AztecBot commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

🤖 Auto-merge enabled after 4 hours of inactivity. This PR will be merged automatically once all checks pass.

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 5, 2026
spalladino and others added 2 commits June 5, 2026 20:15
)

## Motivation

`prepareCheckpointProposal` ran the expensive `checkSync` — a
~6-subsystem query (world-state, archiver tips, p2p, l1→l2 tips,
pending-chain validation, proposed-checkpoint data) plus block-data and
coherence guards — on every tick of every node, *before* the cheap
`checkCanPropose`. Since most nodes are not the proposer for most slots,
the full sync work was wasted on a busy committee.

## Approach

Swap the order so the cheap proposer check runs first and non-proposers
bail before doing the full sync work, while preserving the "invalidate
even if not proposer" liveness backstop. The single coupling —
`considerInvalidatingCheckpoint` previously consumed the whole sync
result — is removed by having it take the archiver's pending-chain
validation status directly (a single cheap read), which is authoritative
on its own. The expensive `checkSync` and the build-start timing gate
(`setState(PROPOSER_CHECK)`) stay on the proposer path, so proposer
build/timing semantics are unchanged.

## Changes

- **sequencer-client**: Reorder `prepareCheckpointProposal` to compute
`checkCanPropose` once up front; escape-hatch voting and the
non-proposer invalidation backstop are evaluated before `checkSync`,
which now runs only for proposers. `considerInvalidatingCheckpoint`
takes a `ValidateCheckpointResult` instead of the full sync result,
wrapped in try/catch on the non-proposer path, and gains a
per-`(checkpoint, slot)` dedup guard (set only after a request is
successfully simulated) to avoid re-simulating/re-submitting within a
slot while still retrying on transient failures.
- **sequencer-client (tests)**: Add coverage for the intentional
behavior changes — a non-proposer with an invalid pending chain
invalidates even when the sync check would fail; an escape-hatch-open
proposer votes regardless of sync state; per-slot invalidation dedup;
and retry within a slot after a transient simulation failure.

### Intentional behavior changes

1. **Invalidation no longer requires a fully-successful `checkSync`.** A
non-proposer with an invalid pending chain now invalidates from
`getPendingChainValidationStatus()` alone.
`simulateInvalidateCheckpoint` (L1-backed) remains the validity gate.
Strengthens the liveness backstop.
2. **Invalidation is no longer gated by the build-start timing throw**,
since non-proposers no longer reach `setState(PROPOSER_CHECK)`. The gate
is scoped to building, and invalidation is not building.
3. **Escape-hatch suppression of invalidation is preserved** by ordering
(escape-hatch `return` stays ahead of the non-proposer invalidation
branch).
4. **Escape-hatch voting on failed-sync slots becomes immediate**, since
it is now evaluated before `checkSync`.

Minor observable change: escape-hatch voting and non-proposer
invalidation no longer transition through the `PROPOSER_CHECK` state, so
that state's duration metric no longer includes `checkCanPropose` or
those paths.

Fixes A-1147
…23916)

## Why

The `merge-train/spartan-v5` PR (#23910) was dequeued from the merge
queue: the merge-queue-heavy run failed on **one** of its 10 grind runs,
on the cli-wallet flow test
`cli-wallet/test/flows/shield_and_transfer.sh`.

The fatal error was a fee-juice claim failing after a chain reorg:

```
WARN  pxe:block_synchronizer  Pruning data after block 7 due to reorg
ERROR wallet  Error: No L1 to L2 message found for message hash 0x008e9f1b...
Simulation error: No L1 to L2 message found ... at FeeJuice.claim_and_end_setup
```

## Root cause

The cli-wallet `send` / `deploy` / `create-account` / `deploy-account`
commands wait for tx status **`proposed`** by default. In the serial,
single-block local-network sandbox a *proposed* block can be orphaned
and pruned before its checkpoint is published — exactly the reorg above.
When that happens, a tx the caller already moved on from is dropped, and
the bridged fee-juice L1→L2 message that the next `deploy-account
--payment fee_juice,claim` depends on disappears, so the claim fails
with `No L1 to L2 message found`. It is intermittent (1/10 grinds here),
which is why it only bit in the heavy merge-queue run.

## Fix

Scoped entirely to the flow-test harness: the `aztec-wallet` wrapper in
`shared/setup.sh` now passes `--wait-for-status checkpointed` for the
tx-producing commands (`send`, `deploy`, `deploy-account`,
`create-account`), so each tx is durably included before the next is
sent. Commands that pass `--no-wait` (e.g. `bridge-fee-juice`) or
already set `--wait-for-status` are left untouched.

**The cli-wallet itself is unchanged — its default is still
`proposed`.** This addresses the earlier review note to not change the
wallet default and keep the change within the tests.
@spalladino spalladino enabled auto-merge June 6, 2026 00:35
@spalladino spalladino added this pull request to the merge queue Jun 6, 2026
@AztecBot

AztecBot commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator Author

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/a554ef5aa6c8e8a2�a554ef5aa6c8e8a28;;�): yarn-project/end-to-end/scripts/run_test.sh ha src/composed/ha/e2e_ha_full.test.ts (260s) (code: 0)

Merged via the queue into v5-next with commit 25ad4ef Jun 6, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants