Skip to content

Add ValkeyNode-managed persistence support#149

Merged
jdheyburn merged 7 commits into
valkey-io:mainfrom
DharmendraChoudhary67:codex/valkeynode-managed-persistence
May 5, 2026
Merged

Add ValkeyNode-managed persistence support#149
jdheyburn merged 7 commits into
valkey-io:mainfrom
DharmendraChoudhary67:codex/valkeynode-managed-persistence

Conversation

@DharmendraChoudhary67
Copy link
Copy Markdown
Contributor

@DharmendraChoudhary67 DharmendraChoudhary67 commented Apr 22, 2026

Summary

  • add first-class persistence configuration to ValkeyCluster and ValkeyNode
  • manage PVCs directly from ValkeyNode instead of using volumeClaimTemplates
  • mount the managed PVC into StatefulSet workloads and persist nodes.conf
  • add explicit Retain / Delete reclaim policy, PVC readiness and resize-aware status, and expansion-only persistence mutation rules

Why

This follows the direction discussed in #121 and the earlier feedback on #85 / #143:

  • persistence is modeled on ValkeyCluster and propagated to ValkeyNode
  • ValkeyNode owns the PVC lifecycle
  • the implementation avoids StatefulSet.volumeClaimTemplates as the storage control plane
  • the contract is kept narrow and upstream-friendly: persistence is StatefulSet-only, cannot be added later, cannot be removed once set, and only supports size expansion

What changed

  • add shared persistence API types and CRD schema
  • create/update named PVCs in the ValkeyNode reconciler
  • wire workloads and generated config to use /data
  • surface PVC readiness / resize state on ValkeyNode status
  • support explicit reclaim policy with delete finalizer behavior
  • add unit/envtest coverage and end-to-end persistence coverage

Follow-up

This supersedes the earlier volumeClaimTemplates-based approach in #143.

@jdheyburn
Copy link
Copy Markdown
Collaborator

Thanks for raising! This is definitely something we want, hope to get round to reviewing it this week. In the meantime would you mind rebasing against main?

Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
@DharmendraChoudhary67 DharmendraChoudhary67 force-pushed the codex/valkeynode-managed-persistence branch from 4ec2e69 to bcc16a2 Compare April 27, 2026 15:56
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
@DharmendraChoudhary67
Copy link
Copy Markdown
Contributor Author

done.

Copy link
Copy Markdown
Contributor

@daanvinken daanvinken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for plumbing through. Excited to test it out.

I realized the controller doesn't watch PVCs in SetupWithManager (it already watches ConfigMaps, StatefulSets, Deployments via Owns()). This means PVC state transitions like Pending -> Bound won't trigger a reconcile until the next 60s periodic requeue, which could make initial creation slower with persistence enabled.

Since we intentionally don't set ownerReferences on PVCs (discussed in #121), we can't use Owns(). But we could use Watches() with a custom handler that maps PVCs back to ValkeyNodes by label.

I don't think it's worth it to go down this path right now, but felt worth it to mention.

"strings"
"time"

vclient "github.com/valkey-io/valkey-go"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is getting big, would it make sense to split into valkeynode_persistence.go?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yepp. I’ll split the persistence-specific reconcile helpers into a separate valkeynode_persistence.go so the main controller flow stays easier to read.

Comment thread internal/controller/utils.go Outdated
return tlsCfg, nil
}

func generateValkeyNodeConfig(node *valkeyv1.ValkeyNode) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, how does this relate/overlap with GetBaseConfig()?

For example TLS config here is a hardcodes string, but a map in the existing one. Should we/can we consolidate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there’s overlap, especially around TLS and persistence directives. Rather than fully merging the cluster and standalone-node builders, I’m thinking of extracting a shared helper for the common managed directives (ACL, persistence paths, TLS), then keeping buildServerConfig() responsible for cluster-specific defaults and user-config merge behavior while generateValkeyNodeConfig() reuses the same shared helper for the standalone ValkeyNode path. That should reduce drift without forcing both paths into the same shape

@daanvinken
Copy link
Copy Markdown
Contributor

FWIW, tested this locally on a kind cluster as well. Created a 2-shard 1-replica cluster with persistence:

spec:
  shards: 2
  replicas: 1
  persistence:
    size: 1Gi

All 4 PVCs came up bound. Wrote a key (valkey-cli -c SET testkey "persist-me"), ran BGSAVE, confirmed dump.rdb and nodes.conf both on /data/. Deleted the pod, it came back with the same node ID and the key was still there.

Tested both reclaim policies. Default Retain: deleted the cluster, PVCs stuck around. Then created a fresh cluster with reclaimPolicy: Delete, verified the finalizer was on the ValkeyNode, deleted the cluster, PVCs were cleaned up.

Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
Copy link
Copy Markdown
Contributor

@daanvinken daanvinken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but my approval has no functional meaning :)

Perhaps @jdheyburn wants to have a look as well and possibly involve others. I think the approach here looks great.

@jdheyburn
Copy link
Copy Markdown
Collaborator

Thanks for raising this @DharmendraChoudhary67 and also to @daanvinken for reviewing - very helpful in what has been a busy week my side. I'll try to get round to testing this by EOD Monday.

Copy link
Copy Markdown
Collaborator

@jdheyburn jdheyburn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a local test; the node reloaded its existing config and also performed a partial resync!

On the whole the PR looks great, and I'm looking forward to getting this tested in a live cloud environment to see how it behaves, but that will be after we've merged.

I've a couple of low priority comments. On top of them, could you please update docs/status-conditions.md with the new status conditions being used?

Comment on lines +108 to +110
if node.Spec.Persistence != nil && node.Spec.WorkloadType == valkeyiov1alpha1.WorkloadTypeDeployment {
return fmt.Errorf("persistence requires workloadType StatefulSet")
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary if the CRD validates it for us? Or is this an additional guard just to be safe?

Comment on lines +187 to +189
if node.Spec.Persistence == nil {
return metav1.ConditionTrue, "", ""
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pvcSizeStatusCondition is called by one parent that already wraps the function call in this condition. Is it worth keeping this check here?

@DharmendraChoudhary67
Copy link
Copy Markdown
Contributor Author

Good catch! @jdheyburn , Its safe to remove them as CRD validation already handles these cases. I'll update the docs/status-conditions.md.

Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
@ekarlso
Copy link
Copy Markdown

ekarlso commented May 5, 2026

@jdheyburn chance to get this in ?

Copy link
Copy Markdown
Collaborator

@jdheyburn jdheyburn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your help on this! 🎉

@jdheyburn jdheyburn merged commit bbbb25e into valkey-io:main May 5, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants