fix(seitask): register sei.io scheme + grant workflownodes RBAC by bdchatham · Pull Request #339 · sei-protocol/sei-k8s-controller

bdchatham · 2026-05-21T18:30:00Z

Summary

Third manual fire of the release-test Workflow surfaced two more contract bugs across the seitask + RBAC interface — both load-bearing for the harness to actually function. Both fixed here.

Bug 1 — `no kind is registered for the type v1alpha1.SeiNodeDeployment in scheme`

`provision-validator-chain` and `provision-rpc-fleet` Task pods both failed at `c.Create(snd, ...)`. Root cause: `cmd/seitask/main.go:kubeClientFromEnv` built the controller-runtime client with `client-go/kubernetes/scheme.Scheme` — which only has builtin K8s types registered. `sei.io/v1alpha1.SeiNodeDeployment` was never added. The client couldn't marshal a typed SND.

Fix: explicit local `taskScheme` initialized once at package level, registering builtins + sei.io/v1alpha1. Chaos Mesh CRs stay on `unstructured` and don't need registration.

Bug 2 — `workflownodes.chaos-mesh.org is forbidden`

`upload-report` failed listing WorkflowNodes for the S3 snapshot. `runner/rbac.yaml` granted `workflows: [get]` for `LoadWorkflowIdentity` but not `workflownodes`.

Fix: extend the existing chaos-mesh.org rule to `["workflows", "workflownodes"]` with verbs `["get", "list"]`.

Note on the third symptom

`run-release-test` errored with `Base URL is missing a protocol. Expected 'ws://' or 'wss://'`. This was a downstream symptom of Bug 1: provision-snd never reached its endpoint-publishing step, so `RPC_TM_RPC` never landed in workflow-vars, and the release-test pod's `SEI_TENDERMINT_RPC: $(RPC_TM_RPC)` resolved to the literal string `$(RPC_TM_RPC)` (K8s leaves unresolved `$(VAR)` references unchanged). Should resolve automatically once Bug 1 is fixed.

Separately: Chaos Mesh Serial doesn't fail-fast

The manual fire also confirmed a structural Chaos Mesh quirk: in v2.8.0 `Serial` template type does NOT fail-fast on child Task errors. All 4 downstream pods ran to termination despite Bug 1 + Bug 2; each WorkflowNode showed `Accomplished=True` regardless of pod exit code. Chaos Mesh's primary use case is fault injection where "the fault ran" is the goal, so marching through child failures is upstream design intent.

Not in scope for this PR. Filed as a separate follow-up — needs design: ConditionalBranches gating, orchestrator-side EXIT_REASON polling + workflow abort, or bash-Task wrappers that abort the parent on `exit 1`.

Test plan

`go test ./...` passes
`golangci-lint run` clean
After merge + image build + SCENARIO_REF bump in platform: manual fire walks past provision-validator-chain successfully (the bug-1 truth-test)

🤖 Generated with Claude Code

Third manual fire of release-test surfaced two more contract bugs: 1. provision-validator-chain + provision-rpc-fleet failed at Create time with `no kind is registered for the type v1alpha1.SeiNodeDeployment in scheme`. cmd/seitask/main.go's kubeClientFromEnv built a controller-runtime client with client-go's built-in scheme (K8s types only); sei.io/v1alpha1 was never registered. Fix: local taskScheme registering builtin + sei.io/v1alpha1. 2. upload-report failed listing workflownodes: 403 from runner/rbac.yaml granting workflows (get) but not workflownodes. Fix: add workflownodes (get, list). Run-release-test's "Base URL is missing a protocol" was a downstream symptom — provision-snd never published endpoints to workflow-vars, so $(RPC_TM_RPC) couldn't resolve and the literal string passed through to the release-test image. Resolves automatically once #1 is fixed. Separately: Chaos Mesh Serial does NOT fail-fast on child Task errors — each WorkflowNode transitions to Accomplished=True on pod termination regardless of exit code, and Serial proceeds to the next child. Filed as separate follow-up; tracked as architectural concern, not in scope for this fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor · 2026-05-21T18:30:07Z

PR Summary

Medium Risk
Medium risk: changes how seitask constructs its controller-runtime client scheme and expands Chaos Mesh RBAC, which can affect runtime behavior and permissions in test harness clusters.

Overview
Fixes seitask Kubernetes client initialization by introducing a package-level taskScheme that registers built-in K8s types plus sei.io/v1alpha1, enabling typed Create/Get round-trips for SeiNodeDeployment/SeiNodeTask.

Adds regression tests to ensure the scheme includes required Sei CRDs and adds a scenario YAML contract test to prevent workflow-vars ConfigMap name mismatches.

Extends runner RBAC to allow read/list access to Chaos Mesh workflownodes (alongside workflows) so upload-report can enumerate workflow node trees.

^{Reviewed by Cursor Bugbot for commit 295f480. Bugbot is set up for automated code reviews on this repo. Configure here.}

Cross-review feedback from platform-engineer + kubernetes-specialist on the #339 chain of fixes: both reviewers independently noted that we've been discovering contract drift between the seitask binary internals and the scenario YAML / RBAC layer at first-fire instead of at build time. Each first-fire bug (#334, #337, #339) is the same shape: an internal helper has a convention, the scenario author has to mirror it manually, no test catches the drift. Two narrow guards land here, ranked by ROI: - TestTaskScheme_RoundTripsSND / _RoundTripsSeiNodeTask: would have caught the #339 scheme-registration bug at `go test`. Validates that the package-level taskScheme actually has every sei.io/v1alpha1 type the seitask subcommands construct via typed Create/Get. - TestScenarioYAMLs_CMNameMatchesWorkflowVarsName: would have caught the #337 CM-name drift at `go test`. Walks every scenario YAML in the opt-in allow list (release-test.yaml today), extracts the Workflow CR's metadata.name, asserts every envFrom configMapRef.name matches WorkflowVarsName(metadataName). Major-upgrade is excluded — its CM is bash-created with a different convention; revisit when the half-bash legacy retires. Defers (filed/tracked separately, not in scope for this PR): - RBAC vs kubebuilder-marker reconciliation test (kubernetes-specialist ranked #3; defer until a third recurrence). - Wrapper SA workflows: [patch] prereq for #340 path 1 (amend on #340). - EXIT_REASON write-once-or-fail-classification semantics for #340 (amend on #340). - Scenario contract enforcement subcommand + SEI_WORKFLOW_VARS_CM env approach (file new issue). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bdchatham mentioned this pull request May 21, 2026

Chaos Mesh Serial doesn't fail-fast on child Task errors — need explicit abort mechanism #340

Open

4 tasks

bdchatham mentioned this pull request May 21, 2026

Scenario contract enforcement: build-time guards + single-sourced CM name #341

Open

4 tasks

bdchatham merged commit 922d599 into main May 21, 2026
5 checks passed

bdchatham mentioned this pull request May 21, 2026

feat(seitask): per-pod EVM endpoint publishing + load-test Workflow (draft) #343

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(seitask): register sei.io scheme + grant workflownodes RBAC#339

fix(seitask): register sei.io scheme + grant workflownodes RBAC#339
bdchatham merged 2 commits into
mainfrom
fix/scheme-and-workflownodes-rbac

bdchatham commented May 21, 2026 •

edited

Loading

Uh oh!

cursor Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdchatham commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug 1 — `no kind is registered for the type v1alpha1.SeiNodeDeployment in scheme`

Bug 2 — `workflownodes.chaos-mesh.org is forbidden`

Note on the third symptom

Separately: Chaos Mesh Serial doesn't fail-fast

Test plan

Uh oh!

cursor Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bdchatham commented May 21, 2026 •

edited

Loading

cursor Bot commented May 21, 2026 •

edited

Loading