Skip to content

docs(CLAUDE.md): codify the conditions convention#329

Merged
bdchatham merged 1 commit into
mainfrom
docs/conditions-convention
May 21, 2026
Merged

docs(CLAUDE.md): codify the conditions convention#329
bdchatham merged 1 commit into
mainfrom
docs/conditions-convention

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

Summary

Adds a ### Conditions section to CLAUDE.md next to ### Status patches. Codifies the working convention for metav1.Condition lifecycle across all three controllers (SeiNodeDeployment, SeiNode, SeiNodeTask).

Net: 1 file, +37 / 0.

Why now

Today the codebase mixes two patterns:

  • School 1 (always-present, transition with reason)NodesReady, GenesisCeremonyComplete, PlanInProgress, RolloutInProgress, TargetReady
  • School 2 (absent-when-not-applicable)RouteReady, GenesisCeremonyNeeded, ImportPVCReady, SigningKeyReady, NodeKeyReady, OperatorKeyringReady, SeiNodeTask Ready/Failed

RouteReady was originally surfaced by the SRE Coral review on the envtest Phase 5 orphan-cleanup PR: the condition is removed when spec.networking is unset, which is indistinguishable in kubectl describe from "controller hasn't reconciled yet." Before fixing one site, we want the doctrine.

Coral cross-review (kubernetes-specialist + sre-engineer) converged unambiguously:

  • K8s upstream lean is unambiguously School 1. Pod, Deployment, Gateway-API HTTPRoute, CAPI Cluster/Machine all use always-present-with-reason. Even broken states (BackendNotFound) are expressed as Status=False, Reason=BackendNotFound, never as absence. Deployment's ReplicaFailure is the rare canonical School 2 (transient exception condition).
  • SRE would block the forthcoming Paused PR specifically to insist it ships School 1, because PromQL keyed on kube_..._status_condition{type="Paused"} becomes brittle if the condition disappears when not paused (absent() gymnastics, false-positive scrape-down alarms on Grafana panels).
  • Don't retroactively flip every condition. The doctrine is the precedent; RouteReadyNetworkingReady is the first conversion (PR B); the rest harmonize on first edit through their respective controller files.

What the doctrine says

  1. School 1 is the default. Once set, a condition stays present on the reconciled object. Transitions express state via Status + Reason + lastTransitionTime.
  2. School 2 has two narrow allowances: *Needed-style (where False would be tautological) and kubectl wait consumers where present-vs-absent semantics are load-bearing (SeiNodeTask).
  3. Naming: *Ready / *InProgress / *Complete → School 1; *Needed → School 2. No mixed polarities for the same subject (the SeiNodeTask Ready/Failed pair is the documented exception, justified by kubectl wait --for=condition=Ready=true + --for=condition=Failed=true dual-exit signaling).
  4. Spec-shape changes don't remove conditions. Set False/NotApplicable instead.
  5. Reasons are a stable enum. CamelCase, finite set per condition type. Dynamic data goes in the message.
  6. ObservedGeneration discipline. Every setCondition site populates it. The four direct apimeta.SetStatusCondition calls in internal/controller/nodetask/controller.go are a known divergence to harmonize on first edit through that file.

Coral cross-review on this PR

K8s-specialist flagged:

  • Pod conditions aren't literally "always-present from t=0" — softened to "stably present once the kubelet has begun processing the pod." Same softening applied to Deployment and CAPI claims.
  • HTTPRoute conditions are per-ParentRef — parenthetical added.
  • *Pending vs *InProgress were lumped together; dropped *Pending since *InProgress covers both for sei's domain.
  • kubectl wait --for=condition=Failed is the dual exit signal — explained why the SeiNodeTask Ready+Failed mixed-polarity pair is the documented exception.
  • ObservedGeneration known divergence — call out the specific nodetask call sites needing harmonization.
  • Spec-shape changes — added explicit False/NotApplicable guidance for the transition-to-inapplicable case.

All incorporated.

Test plan

  • Doc-only PR; no code change; existing tests + build remain unaffected
  • Convention is referenced explicitly by the next two PRs (RouteReady rename + orphan-cleanup envtest)

🤖 Generated with Claude Code

Adds a Conditions section to the project's working-doc next to Status
patches. Establishes School 1 (always-present, transition with reason)
as the default for every metav1.Condition on SeiNodeDeployment,
SeiNode, and SeiNodeTask, matching Kubernetes upstream Pod / Deployment
/ Gateway-API / CAPI conventions. School 2 (absent-when-not-applicable)
is reserved for two narrow cases: *Needed-style exception conditions
and kubectl wait consumer conditions (SeiNodeTask Ready/Failed).

Lands ahead of two follow-up PRs:
- PR B: rename ConditionRouteReady to ConditionNetworkingReady and
  migrate to School 1 with stable reasons (NetworkingDisabled,
  NoEffectiveRoutes, etc.), removing both removeCondition call sites
- PR C: orphan-cleanup envtest exercising the new condition shape
@cursor
Copy link
Copy Markdown

cursor Bot commented May 21, 2026

PR Summary

Low Risk
Doc-only change that codifies condition lifecycle/naming guidelines; no runtime or API behavior is modified.

Overview
Adds a new ### Conditions section to CLAUDE.md that standardizes how controllers should manage metav1.Conditions (default always-present with Status/Reason transitions), when absence is allowed (narrow “School 2” exceptions), and how to name conditions and reasons.

Also documents additional guardrails like using False/NotApplicable instead of removing conditions on spec-shape changes and requiring ObservedGeneration to be set for condition updates.

Reviewed by Cursor Bugbot for commit 985c691. Bugbot is set up for automated code reviews on this repo. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant