Add ValkeyNode-managed persistence support#149
Conversation
b105ddd to
4ec2e69
Compare
|
Thanks for raising! This is definitely something we want, hope to get round to reviewing it this week. In the meantime would you mind rebasing against main? |
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
4ec2e69 to
bcc16a2
Compare
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
|
done. |
daanvinken
left a comment
There was a problem hiding this comment.
This looks great, thanks for plumbing through. Excited to test it out.
I realized the controller doesn't watch PVCs in SetupWithManager (it already watches ConfigMaps, StatefulSets, Deployments via Owns()). This means PVC state transitions like Pending -> Bound won't trigger a reconcile until the next 60s periodic requeue, which could make initial creation slower with persistence enabled.
Since we intentionally don't set ownerReferences on PVCs (discussed in #121), we can't use Owns(). But we could use Watches() with a custom handler that maps PVCs back to ValkeyNodes by label.
I don't think it's worth it to go down this path right now, but felt worth it to mention.
| "strings" | ||
| "time" | ||
|
|
||
| vclient "github.com/valkey-io/valkey-go" |
There was a problem hiding this comment.
This file is getting big, would it make sense to split into valkeynode_persistence.go?
There was a problem hiding this comment.
Yepp. I’ll split the persistence-specific reconcile helpers into a separate valkeynode_persistence.go so the main controller flow stays easier to read.
| return tlsCfg, nil | ||
| } | ||
|
|
||
| func generateValkeyNodeConfig(node *valkeyv1.ValkeyNode) string { |
There was a problem hiding this comment.
Just for my understanding, how does this relate/overlap with GetBaseConfig()?
For example TLS config here is a hardcodes string, but a map in the existing one. Should we/can we consolidate?
There was a problem hiding this comment.
I agree there’s overlap, especially around TLS and persistence directives. Rather than fully merging the cluster and standalone-node builders, I’m thinking of extracting a shared helper for the common managed directives (ACL, persistence paths, TLS), then keeping buildServerConfig() responsible for cluster-specific defaults and user-config merge behavior while generateValkeyNodeConfig() reuses the same shared helper for the standalone ValkeyNode path. That should reduce drift without forcing both paths into the same shape
|
FWIW, tested this locally on a kind cluster as well. Created a 2-shard 1-replica cluster with persistence: spec:
shards: 2
replicas: 1
persistence:
size: 1GiAll 4 PVCs came up bound. Wrote a key ( Tested both reclaim policies. Default Retain: deleted the cluster, PVCs stuck around. Then created a fresh cluster with |
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
daanvinken
left a comment
There was a problem hiding this comment.
LGTM but my approval has no functional meaning :)
Perhaps @jdheyburn wants to have a look as well and possibly involve others. I think the approach here looks great.
|
Thanks for raising this @DharmendraChoudhary67 and also to @daanvinken for reviewing - very helpful in what has been a busy week my side. I'll try to get round to testing this by EOD Monday. |
jdheyburn
left a comment
There was a problem hiding this comment.
I did a local test; the node reloaded its existing config and also performed a partial resync!
On the whole the PR looks great, and I'm looking forward to getting this tested in a live cloud environment to see how it behaves, but that will be after we've merged.
I've a couple of low priority comments. On top of them, could you please update docs/status-conditions.md with the new status conditions being used?
| if node.Spec.Persistence != nil && node.Spec.WorkloadType == valkeyiov1alpha1.WorkloadTypeDeployment { | ||
| return fmt.Errorf("persistence requires workloadType StatefulSet") | ||
| } |
There was a problem hiding this comment.
Is this check necessary if the CRD validates it for us? Or is this an additional guard just to be safe?
| if node.Spec.Persistence == nil { | ||
| return metav1.ConditionTrue, "", "" | ||
| } |
There was a problem hiding this comment.
pvcSizeStatusCondition is called by one parent that already wraps the function call in this condition. Is it worth keeping this check here?
|
Good catch! @jdheyburn , Its safe to remove them as CRD validation already handles these cases. I'll update the |
Signed-off-by: Dharmendra Choudhary <dharmendra.c@juspay.in>
|
@jdheyburn chance to get this in ? |
jdheyburn
left a comment
There was a problem hiding this comment.
Thank you for your help on this! 🎉
Summary
ValkeyClusterandValkeyNodeValkeyNodeinstead of usingvolumeClaimTemplatesnodes.confRetain/Deletereclaim policy, PVC readiness and resize-aware status, and expansion-only persistence mutation rulesWhy
This follows the direction discussed in #121 and the earlier feedback on #85 / #143:
ValkeyClusterand propagated toValkeyNodeValkeyNodeowns the PVC lifecycleStatefulSet.volumeClaimTemplatesas the storage control planeStatefulSet-only, cannot be added later, cannot be removed once set, and only supports size expansionWhat changed
ValkeyNodereconciler/dataValkeyNodestatusFollow-up
This supersedes the earlier
volumeClaimTemplates-based approach in #143.