Skip to content

deletionPolicy: Retain on SeiNodeDeployment doesn't protect the underlying PV reclaim policy #292

@bdchatham

Description

@bdchatham

Problem

Today, when a SeiNodeDeployment has spec.deletionPolicy: Retain set and the SND is deleted, the controller calls orphanChildSeiNodes / orphanNetworkingResources (internal/controller/nodedeployment/controller.go:160), which removes ownerReferences from child SeiNodes so they survive the SND deletion. However, this orphan path does not touch the bound PersistentVolume's persistentVolumeReclaimPolicy. For dynamically provisioned EBS volumes the PV's reclaim policy defaults to Delete, so any downstream operation that ultimately deletes the PersistentVolumeClaim will cascade to destroying the underlying EBS volume — even though the SND was explicitly set to Retain.

Observed during incident triage on 2026-05-19 in pacific-1: while triaging a stuck node-0-0-0 pod (missing rbac-proxy ConfigMap), the operator's instinct was "delete the SND and let Flux recreate it cleanly." Read-only checks revealed that the SND's Retain field offered no protection for the data on the bound PV. The field's name strongly implies data preservation; the actual behavior only preserves K8s object-graph orphaning.

Impact

Production data-loss risk on archive nodes. In pacific-1, three archive SeiNodes (archive-0-0, archive-1-0, archive-2-0) hold large state-snapshot datasets that take many hours to days to rebuild from peers. Operators relying on deletionPolicy: Retain for safety today have a false sense of protection — a careless kubectl delete pvc (or any controller-side cascade that ultimately deletes the PVC) destroys irreplaceable chain data.

This also blocks "delete and recreate from Flux" as a generally safe troubleshooting pattern. Today, any incident triage that reaches for "nuke and let Flux reconcile" must separately patch every PV's reclaim policy as a defensive step. The friction means operators avoid an otherwise simple recovery path.

Relevant experts

  • kubernetes-specialist — controller-runtime, ensure_pvc.go, lifecycle handling
  • platform-engineer — StorageClass design, EBS reclaim semantics, infra patterns

Proposed approach

Two viable paths:

  1. Per-template storageClassName (preferred, smaller change). Add storageClassName to SeiNodeTemplate spec; expose a corresponding Retain-reclaim StorageClass (e.g. gp3-retain-archive) in the platform repo. Archive SNDs opt in via storageClassName: gp3-retain-archive. The SC has reclaimPolicy: Retain so all provisioned PVs inherit that policy at creation time — no post-bind patching needed. ~½ day Go + tests, plus 1 StorageClass YAML.

  2. Post-bind PV patch in ensure_pvc.go. When the SND's deletionPolicy: Retain is set, after the PVC is Bound, look up the PV and patch persistentVolumeReclaimPolicy: Retain. Requires a reconcile loop until the PV exists. ~1 day Go + tests. Doesn't require any platform-side StorageClass.

Approach (1) is cleaner — declarative at provisioning time, no runtime patching. Approach (2) is zero-opt-in friction for any SND that already sets Retain but introduces an asynchronous patch.

Acceptance criteria

  • Deleting an SND with deletionPolicy: Retain does not result in PV deletion under any cascade path (e2e: create SND with Retain, delete SND, confirm PVC + PV remain, confirm EBS volume not destroyed).
  • DeletionPolicy field doc (api/v1alpha1/seinodedeployment_types.go) clarifies volume-preservation behavior.
  • If approach (1) chosen: storageClassName field on SeiNodeTemplate; integration test for a SeiNode pointing at a Retain-reclaim SC.
  • If approach (2) chosen: unit test for PV-patch reconcile with race coverage on PV creation latency.

Out of scope

  • Migrating existing PVs to Retain reclaim. This issue addresses new provisioning; existing volumes get a one-time manual patch as a runbook task.
  • Name-collision handling on SND recreate. Orphaning a SeiNode and having Flux recreate the SND today causes a name conflict (orphan still exists with the same name). Solving "delete the SND, let Flux recreate cleanly" end-to-end requires this issue plus a separate orphan-adoption or replace flow. File as follow-up if/when needed.
  • Non-EBS storage backends. Framing is EBS-specific; other CSI drivers (EFS, FSx) may have different reclaim semantics worth verifying separately.

References

  • Discovered during incident triage 2026-05-19 — pacific-1 node-0-0 stuck on missing rbac-proxy CM
  • internal/controller/nodedeployment/controller.go:160 — current orphan-children behavior under DeletionPolicy: Retain
  • internal/task/ensure_pvc.go — PVC creation path that doesn't honor SND retain semantics
  • api/v1alpha1/seinodedeployment_types.go:21-27 — current DeletionPolicy field definition + doc

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions