SeiNode readiness probe should include catching_up=false on full-node-role nodes

## Problem

`SeiNodeDeployment.status.phase=Ready` (computed at `api/v1alpha1/seinodedeployment_types.go:188`) is satisfied when child node `phase=Running` plus `ConditionNodesReady`, but does **not** include "all full nodes have caught up" (`catching_up=false`). The `AwaitNodesCaughtUp` task exists at `internal/task/deployment.go:46-51` and is invoked during hard-fork rollouts, but not during initial bring-up.

For genesis-bootstrap chains the gap is essentially zero — validators produce blocks from height 0. For full-node fleets that block-sync from validators or external state, there is a real gap between `phase=Ready` (pods running) and "RPC is actually serving caught-up data."

## Impact

### Primary use case — multi-consumer reliability

This affects three consumers, none of which is currently blocked but all of which would benefit:

- **ValidationRun** (PR #143): the workload Job may attempt to drive load before fullnodes can serve catch-up'd RPC. The LLD documents this as Open Dependency #1 and accepts the genesis-bootstrap case as the dominant v1 case (no real catch-up gap).
- **Hard-fork rollouts**: premature service rotation when fullnodes haven't caught up post-rollout.
- **Steady-state operations**: newly-added fullnodes serving traffic before catch-up.

### Cost of not addressing

Each consumer reimplements its own catch-up probe (`kubectl exec seid status | grep catching_up`) — exactly the bash-glue pattern the controller boundary exists to absorb. This issue moves the responsibility into the right place once.

## Relevant experts

- `kubernetes-specialist` — readiness probe contract, sidecar `/health` endpoint design
- `blockchain-developer` — `seid status` semantics, catching_up signal interpretation

## Proposed approach

Per the ValidationRun LLD's recommendation (option b in Open Dependency #1):

**Extend the `seictl` sidecar's HTTP `/health` endpoint to return 503 while `seid status.SyncInfo.catching_up=true`.** kubelet readiness probe consumes `/health`; Pod isn't Ready until caught up; SND `phase=Ready` automatically requires catch-up via the existing `ConditionNodesReady` chain.

Concretely:
- Sidecar `/health` handler: poll `seid status` (or its more efficient ABCI-direct equivalent), return 503 if `catching_up=true`, 200 otherwise.
- StatefulSet pod template: `readinessProbe.httpGet.path=/health` (already wired for the sidecar in most deployments — verify).
- No SND status schema change needed; the existing `ConditionNodesReady` already aggregates from pod-readiness.

Alternative options considered and rejected per the LLD:
- (a) Extend SND status with explicit `CaughtUp` condition: more invasive schema change.
- (c) ValidationRun separately probes child SeiNodes: pushes responsibility into the wrong controller.

## Acceptance criteria

- [ ] Sidecar `/health` endpoint returns 503 when `seid status.SyncInfo.catching_up=true`, 200 otherwise
- [ ] Pod readiness probe wired to `/health` on validator and full-node SND templates
- [ ] Verify SND `phase=Ready` automatically gates on the new probe via existing `ConditionNodesReady`
- [ ] Integration test: bootstrap a chain, force a fullnode behind, observe Pod NotReady → SND not Ready, then catch up → Ready
- [ ] Document the contract in the sidecar's README and in `SeiNodeDeployment` CRD docs

## Out of scope

- Surfacing `catching_up` as a separate first-class SND status field (handled by the option-b approach via existing readiness aggregation)
- ValidationRun controller probes (delegated to SND readiness; do not reimplement)

## References

- ValidationRun LLD: sei-protocol/sei-k8s-controller#143 — Open Dependency #1
- `api/v1alpha1/seinodedeployment_types.go:188` — existing `phase=Ready` computation
- `internal/task/deployment.go:46-51` — existing `AwaitNodesCaughtUp` task (hard-fork rollout path)
- `sei-protocol/sei-k8s-controller#139` — design ask referencing this gap


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SeiNode readiness probe should include catching_up=false on full-node-role nodes #144

Problem

Impact

Primary use case — multi-consumer reliability

Cost of not addressing

Relevant experts

Proposed approach

Acceptance criteria

Out of scope

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SeiNode readiness probe should include catching_up=false on full-node-role nodes #144

Description

Problem

Impact

Primary use case — multi-consumer reliability

Cost of not addressing

Relevant experts

Proposed approach

Acceptance criteria

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions