Skip to content

docs: validation-substrate design (supersedes sei-k8s-controller#143)#96

Merged
bdchatham merged 2 commits into
mainfrom
design/validation-substrate
Apr 30, 2026
Merged

docs: validation-substrate design (supersedes sei-k8s-controller#143)#96
bdchatham merged 2 commits into
mainfrom
design/validation-substrate

Conversation

@bdchatham
Copy link
Copy Markdown
Contributor

@bdchatham bdchatham commented Apr 30, 2026

Summary

Captures the pivot away from the merged ValidationRun CRD LLD (sei-protocol/sei-k8s-controller#143) toward a CLI-substrate model: seictl primitives (chain/rpc/load/harness/rules) composed by sugar verbs (bench/qa/shadow). The runtime workload contract from sei-protocol/platform#235 is kept verbatim and already implemented in `bench up`.

Key decisions captured

  • Replace CRD with CLI primitives. Test orchestration is imperative + time-bounded; the CRD's phase machine + condition machine + two-controller dance was paying for declarative-desired-state semantics that tests don't need.
  • v1 ships effectively zero new code. Today's `bench up` covers the seiload-nightly use case (the LLD's primary Phase 1 consumer). Primitives land on demand with named triggers, not speculatively.
  • Single binary, two install paths. Standalone `seictl` AND kubectl plugin via `kubectl-sei` symlink — one parser, one help tree, zero code change.
  • Label-driven cascade-delete, not OwnerRefs across primitives. Cross-primitive coupling is rejected in favor of `sei.io/chain-id` selectors.
  • `rules watch` is a Job, not a controller — deferred until a real engineer hits a "passed-but-validators-OOM" signal.

Anti-features (deliberate)

The doc explicitly enumerates what the LLD's gravitational pull would tempt us to build:

  • Unified `validation.sei.io/v1` YAML schema
  • Generic `harness` substrate
  • Symmetric verb sets for symmetry's sake
  • Observability-as-test-oracle in the CLI
  • Per-verb kubectl plugin symlinks

Process

Coral round dispatched three specialists in parallel — platform-engineer (substrate), product-manager (scope discipline), product-engineer (cross-surface ergonomics). Outputs synthesized inline. The PM's "v1 ships nothing new" stance won on scope; the platform-engineer's label contract + peer-discovery mechanism won on substrate; the product-engineer's MCP composite-as-tool / kubectl-sei prefix won on distribution.

Test plan

  • Skim the doc for tone consistency with existing `docs/design/cluster-cli.md`
  • Confirm the v1 ship cut table matches what's actually shipped today (`bench up/down/list` only)
  • Confirm anti-features list reflects coral synthesis (not random YAGNI)
  • Comment thread on docs: LLD for ValidationRun CRD (for #139) sei-k8s-controller#143 documenting the supersession (will fire after this lands)

🤖 Generated with Claude Code

Captures the pivot away from the merged ValidationRun CRD LLD toward
a CLI-substrate model: seictl primitives (chain/rpc/load/harness/rules)
composed by sugar verbs (bench/qa/shadow), with the same workload
runtime contract from platform#235 already implemented in `bench up`.

Coral round dispatched platform-engineer (substrate), product-manager
(scope discipline), and product-engineer (cross-surface ergonomics)
specialists. Synthesis: v1 ships effectively zero new code — today's
`bench up` covers the seiload-nightly use case. Primitives land on
demand with named triggers.

Distribution: single binary, two install paths — standalone `seictl`
plus kubectl plugin via `kubectl-sei` symlink (one symlink, no code
change).

Anti-features explicitly called out: unified validation.sei.io YAML
schema, generic harness substrate, symmetric verb sets, observability-
as-test-oracle in the CLI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt emitter

Per maintainer pushback: harness is a category error. seictl provisions
ephemeral chain infrastructure + in-cluster load generation; out-of-
cluster test binaries (qa-testing's TS suite, fuzzers, integration
tests) run wherever the engineer/CI invokes them and consume seictl-
emitted endpoints via env vars/flags. Exit code + logs are the verdict.
This is the autobake pattern, and the kubectl pattern.

Changes:

- Drop `harness` from the primitive surface (4 primitives now: chain,
  rpc, load, rules).
- Drop `qa up` composite. Only `bench up` (shipped) and `shadow up`
  (deferred, replayer is typed on SeiNode today).
- Reframe goals around endpoint emission as the contract for downstream
  tooling.
- Add a new Design subsection ("Endpoint emission") with concrete shell
  + GHA examples showing how external test binaries consume endpoints.
- Replace the "generic harness substrate" anti-feature bullet with
  "seictl as a test runner" — explicit-rejected, with the rationale.
- Update v1 ship cut table: harness/qa up moves from "Defer" to
  "Out of scope (not deferred — rejected)".
- Update MCP graduation, label contract, wait semantics, references.
- New Open Question on endpoint field shape (lock the contract early).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham merged commit dba2514 into main Apr 30, 2026
3 checks passed
@bdchatham bdchatham deleted the design/validation-substrate branch April 30, 2026 16:24
bdchatham added a commit that referenced this pull request Apr 30, 2026
## Summary

Follow-up to #96 locking the endpoint emission contract. Brandon
directed split-by-RPC-type on the merge thread; this PR captures the
shape concretely and removes the resolved Open Question.

## Contract

```json
{
  "endpoints": {
    "tendermintRpc": ["http://...:26657", ...],
    "evmJsonRpc":    ["http://...:8545",  ...],
    "cosmosGrpc":    ["...:9090", ...]
  }
}
```

Three rules behind it:

- **Split by RPC type, not flat-with-discriminator.** Most consumers
know which port they want — split eliminates filter logic on the
consumer side.
- **Per-pod, not aggregate.** Consumers shard, round-robin, or pick.
Aggregate Service URLs only allow kube-proxy round-robin.
- **Full URLs (with scheme + port) for HTTP types; \`host:port\` for
gRPC.** gRPC clients don't take \`http://\` prefixes — Sei's gRPC is
h2c.

Adding new types (\`evmWebSocket\`, \`cosmosLcd\`, etc.) is
backwards-compatible. Renaming \`endpoints.<type>\` is not — locked.

## Test plan

- [ ] Skim the diff
- [ ] Confirm shell + GHA examples reflect the typed shape
- [ ] No other harness/qa references re-emerged

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant