Skip to content

fix(e2e): drop removed enforceTimeTable option from optimistic proving test#23976

Merged
AztecBot merged 3 commits into
merge-train/spartan-v5from
cb/fix-spartan-v5-enforce-timetable
Jun 10, 2026
Merged

fix(e2e): drop removed enforceTimeTable option from optimistic proving test#23976
AztecBot merged 3 commits into
merge-train/spartan-v5from
cb/fix-spartan-v5-enforce-timetable

Conversation

@AztecBot

@AztecBot AztecBot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Problem

CI on merge-train/spartan-v5 (commit 609014a, log) failed in the yarn-project build at the yarn tsgo -b --emitDeclarationOnly step:

end-to-end/src/e2e_epochs/epochs_optimistic_proving.parallel.test.ts(222,9): error TS2353:
  Object literal may only specify known properties, and 'enforceTimeTable'
  does not exist in type 'EpochsTestOpts'.

(also at lines 366, 473, 558, 646, 780)

Root cause

PR #23821 (always enforce timetable with concrete block duration) made timetable enforcement unconditional and removed the enforceTimeTable option from EpochsTestOpts/SetupOptions, deleting ~30 enforceTimeTable: true call sites. epochs_optimistic_proving.parallel.test.ts landed on the v5 line separately and still passed enforceTimeTable: true at six sites, so it no longer type-checks.

Fix

  • Remove the six now-invalid enforceTimeTable: true properties. Each call site already sets a concrete blockDurationMs: 8000, so the change is behavior-preserving — the same deletion the PR applied to every other e2e test. Verified in CI: yarn-project now type-checks and epochs_optimistic_proving.parallel.test.ts passes.
  • Temporarily it.skip the HA test should distribute work across multiple HA nodes in composed/ha/e2e_ha_full.test.ts, which fails under the always-enforced timetable (sequencer misses slots: BlockOrCheckpointSlotExpiredError / no_blocks_built / Fork not found). Skipped at Santiago's request, to be re-enabled after the HA block-building interaction with refactor(sequencer)!: always enforce timetable with concrete block duration #23821 is fixed.

@AztecBot AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels Jun 9, 2026
@spalladino spalladino marked this pull request as ready for review June 9, 2026 21:54
@spalladino spalladino enabled auto-merge (squash) June 9, 2026 21:54
…g test

PR #23821 made timetable enforcement unconditional and removed the
enforceTimeTable option from EpochsTestOpts, but
epochs_optimistic_proving.parallel.test.ts still passed it at six call
sites, breaking the yarn-project tsgo type-check on merge-train/spartan-v5.
Each site already sets a concrete blockDurationMs, so removing the now
default-enforced option is behavior-preserving.
@AztecBot AztecBot force-pushed the cb/fix-spartan-v5-enforce-timetable branch from 80c489a to c109cfa Compare June 9, 2026 21:57
@spalladino spalladino disabled auto-merge June 10, 2026 00:46
…des'

Temporarily skip the HA work-distribution test, which fails under the
always-enforced timetable from #23821 (sequencer misses slots:
BlockOrCheckpointSlotExpiredError / no_blocks_built / Fork not found).
To be re-enabled after the HA block-building interaction is fixed.
Skip the whole HA Full Setup suite rather than the single test: it fails
under the always-enforced timetable from #23821 (sequencer misses slots:
BlockOrCheckpointSlotExpiredError / no_blocks_built / Fork not found).
To be re-enabled after the HA block-building interaction is fixed.
@AztecBot AztecBot merged commit f19832a into merge-train/spartan-v5 Jun 10, 2026
12 checks passed
@AztecBot AztecBot deleted the cb/fix-spartan-v5-enforce-timetable branch June 10, 2026 02:26
spalladino added a commit that referenced this pull request Jun 10, 2026
The suite was skipped in #23976 while the HA block-building interaction with
the always-enforced timetable (#23821) was diagnosed; the preceding commits
fix that interaction.
PhilWindle pushed a commit that referenced this pull request Jun 11, 2026
…g anvil (#23979)

Fixes the flaky HA full suite (`e2e_ha_full`) seen in
http://ci.aztec-labs.com/8e1e980c4886df0d, where "should distribute work
across multiple HA nodes" timed out awaiting a trigger tx. Also
re-enables the suite, which #23976 had skipped.

## Root cause

The HA compose suite was the only block-building suite running against
an L1 with no self-advancing clock. Its anvil container ran in automine
with no `--block-time`, and being external, it was excluded from the
`TestDateProvider` sync that locally-spawned anvils get. L1 chain time
only moved when something mined, while the shared sequencer clock
free-ran. #23821 removed the `AnvilTestWatcher` that used to couple the
two clocks in this mode and replaced it with per-iteration nudges in the
test (clock warp + blind `mine(8)`).

Two consequences, both visible in the failed run's logs:

- The `mine(8)` overshoot put L1 ~1.5 slots ahead of the test clock, so
each iteration's first propose raced its slot boundary and was silently
dropped, followed by a prune that destroyed the pipelined builders'
forks (`Fork not found` on all surviving nodes). This race was lost in
passing runs too.
- Recovery then required the proposers' archiver-sync gate to clear, but
the gate's deadline runs on the free-running test clock while nothing
mines L1 during the test's `waitForTx` — `Archiver did not sync L1 past
slot 109 before slot 110 expired, discarding pipelined work`, repeated
until the jest timeout. Whether a run passed or failed came down to
seconds of margin on this gate.

## Fix

Stop emulating L1 time in the test and run the suite in the same regime
as every other block-building e2e (e.g. `e2e_epochs`):

- Drop the anvil container and `ETHEREUM_HOSTS` from the HA compose
file. With no external L1 configured, `setup()` spawns anvil in-proc
with interval mining (`--block-time = ethereumSlotDuration`) and keeps
the `TestDateProvider` snapped to L1 block timestamps via the existing
stdout listener. The sibling web3signer compose suite already works this
way.
- Add `automineL1Setup: true` so L1 contract deployment runs under
temporary automine before interval mining starts.
- Delete all time scaffolding from the test (clock warps, cheat-mining
heartbeats, archiver sync nudges). Tests submit a tx and wait, in real
time. No assertions change.

No production code changes: with a self-advancing L1, the sequencer and
publisher behave exactly as on a real network.

## Parallelization

The suite file is renamed to `e2e_ha_full.parallel.test.ts`, so CI runs
each of its 8 tests as an isolated job in its own compose stack instead
of one 15+ minute serial job:

- `bootstrap.sh` expands the HA suite per test name (same mechanism as
the existing `.parallel` simple tests).
- `run_test.sh` forwards the test name into the compose stack and
namespaces the docker compose project per test so concurrent jobs on one
host don't collide.
- `sendTriggerTx` now starts the HA sequencers idempotently, since under
per-test isolation the governance/reload/distribute tests run without
the first test (previously the only caller of `startHASequencers`).
- Three clock-skew test titles contained parentheses, which jest's
`--testNamePattern` interprets as regex groups (the filter would
silently match nothing); they are retitled.

## Teardown fix (follow-up to the first CI round)

The first CI round passed every test body but three jobs
(produce-blocks, governance, reload) hung in `afterAll` until the job
timeout. Two compounding causes, both fixed here:

- `afterAll` reset the shared `TestDateProvider` *before* stopping
nodes. The reset rewinds the clock from chain time to wall time —
minutes apart after the automine deploy burst — so vote submissions
armed against the rewound clock pushed sequencer stops out by that gap.
The old 30s abandon-race then gave up, and the abandoned nodes outlived
the jest environment, keeping the worker alive until the CI timeout
(jest runs without `forceExit`). `afterAll` now stops sequencers first,
awaits every node stop fully, and resets the clock last. These three
jobs are the ones whose tests end with sequencers still running; the
distribute test (which stops nodes in-test, before any reset) passed for
the same reason.
- Ports #23990 from `merge-train/spartan` (not previously on the v5
line): `CheckpointProposalJob.interrupt()` now propagates to the
publisher, cancelling the `sendRequestsAt` slot-deadline sleep on
sequencer stop, so a pending vote submission can never block shutdown.
The original PR's `e2e_ha_full` teardown changes are superseded by the
rework above and were not ported.

## Verification

- Three full local runs of the suite via `run_test.sh ha` (all 8 tests
each): green in 255s / 254s / 268s of jest time (the old warp-based
suite ran 10+ minutes), with zero occurrences of the old failure
signatures (`Fork not found`, `Archiver did not sync`, `discarding
pipelined work`) — passing runs of the old code showed 12+ `Fork not
found` errors even when green.
- One per-test CI-style run (`run_test.sh ha <file> "should distribute
work across multiple HA nodes"`): the originally flaky test passes
standalone in its own compose stack (7 skipped, 1 passed), exercising
the full `TEST_NAME` plumbing.
- `yarn build`, `yarn format`, `yarn lint` clean; `sequencer-client`
unit tests pass (back to the pre-change suite after the revert).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants