chore(ci): apply deploy-then-wait refactor to spartan nightly benchmarks#23489
Merged
Conversation
Restructures nightly-bench-10tps and nightly-spartan-bench (tps, proving, block-capacity) to deploy the network via deploy-network.yml on the runner, wait for the first L2 block, then run the benchmark with SKIP_NETWORK_DEPLOY=1, cutting the SSM Undeliverable failures from the long deploy-inside-SSM command. Adds SKIP_NETWORK_DEPLOY plumbing to the network-bench / network-block-capacity-bench / network-bench-10tps ci3 commands. Bundles the helper changes from #23395 (deploy-network.yml notify_on_failure, spartan wait_for_l2_block, proving-bench SKIP_NETWORK_DEPLOY) so the PR is self-contained on next, where #23395 has not yet landed.
spypsy
approved these changes
May 22, 2026
spypsy
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Applies the same restructuring introduced for the weekly proving benchmark in #23395 to the remaining spartan nightly benchmark workflows, to cut the SSM
Undeliverablefailures we keep hitting (e.g. the nightly 10 TPS run that died midnetwork_deploy).The core idea (unchanged from #23395): stop doing the network deploy + multi-epoch chain warmup inside the single long-running EC2/SSM benchmark command. Instead:
deploy-network.yml(runs on the GitHub runner via GKE, no SSM) deploys the network.wait-for-l2-blockjob polls until the chain produces its first L2 block.SKIP_NETWORK_DEPLOY=1, so the SSM command only builds + measures against the already-live network — a much shorter SSM lifetime, which is whereUndeliverablewas striking.cleanupjob (if: always()) tears the network down, andnotify-failurekeeps the existing Slack/ClaudeBox alerts.Changes
Workflows restructured into
select-image → deploy-network → wait-for-l2-block → benchmark (SKIP_NETWORK_DEPLOY=1) → cleanup → notify:.github/workflows/nightly-bench-10tps.yml— single 10 TPS benchmark (env/namespacebench-10tps)..github/workflows/nightly-spartan-bench.yml— all three benches:benchmark(tps-scenario → nsnightly-bench),proving-benchmark(prove-n-tps-fake),block-capacity-benchmark(block-capacity → nsnightly-block-capacity). Thestatusaggregator and per-bench Slack messages are preserved. All jobs check outref: next.SKIP_NETWORK_DEPLOYplumbing (mirrors the proving-bench path from #23395) added to:ci.sh:network-bench,network-block-capacity-bench,network-bench-10tps.bootstrap.sh:ci-network-bench,ci-network-block-capacity-bench,ci-network-bench-10tps.Bundled from #23395 (not yet on
next)This PR targets
next, but the benchmark changes depend on helpers that currently only exist onmerge-train/spartanvia #23395. To make this PR self-contained and mergeable tonexton its own, it also carries #23395's helper diff:deploy-network.yml—notify_on_failureinput.spartan/bootstrap.sh—wait_for_l2_blockcommand.spartan/scripts/wait_for_l2_block.sh— RANDAO-aware wait logic.bootstrap.sh/ci.sh—SKIP_NETWORK_DEPLOYfor the proving-bench path.weekly-proving-bench.yml— the original restructure from chore: refactor weekly proving test wait #23395.These hunks are identical to #23395, so git will reconcile them when #23395 lands independently. If you'd rather land #23395 first and have me drop the bundled commit, say so.
Note on namespaces
For jobs whose deploy namespace differs from the env-file name (
nightly-bench,nightly-block-capacity), the wait step setsNAMESPACEexplicitly sowait_for_l2_blockpolls the right pods (env files use${NAMESPACE:-default}, so the override is respected).Not in scope here
The genuine nightly scenario tests run from
ci3.yml'sci-network-scenariojob (tag-gated), not fromtest-network-scenarios.yml(a manualworkflow_dispatchrunner).ci-network-scenariohas the same deploy-inside-SSM shape and could get the same treatment as a follow-up.Test plan
These can't be validated from a sandbox (need AWS/GCP secrets + a real deploy). Before relying on the schedule, run each via
workflow_dispatchonce:Nightly Bench 10 TPS— confirm deploy-network + wait-for-l2-block succeed, then the benchmark runs against the live network withSKIP_NETWORK_DEPLOY=1, and cleanup tears down.Nightly Spartan Benchmarks— confirm all three deploy/wait/benchmark/cleanup chains run in parallel without namespace collisions andstatusreports correctly.