Skip to content

(feat): Status Conditions for ValkeyCluster#33

Merged
sandeepkunusoth merged 14 commits into
valkey-io:mainfrom
sandeepkunusoth:add-valkeyClusterStatus-conditions
Dec 25, 2025
Merged

(feat): Status Conditions for ValkeyCluster#33
sandeepkunusoth merged 14 commits into
valkey-io:mainfrom
sandeepkunusoth:add-valkeyClusterStatus-conditions

Conversation

@sandeepkunusoth
Copy link
Copy Markdown
Member

@sandeepkunusoth sandeepkunusoth commented Dec 19, 2025

This PR enhances ValkeyCluster status (#26) by updating status metadata and status.conditions on every reconciliation. It also updates E2E tests to cover: (Success) provisioning a ValkeyCluster and asserting it reaches Ready with the expected condition, and (Negative) simulating shard loss by deleting a shard deployment to force Degraded state and verifying the Degraded/Ready conditions are set as expected.

Signed-off-by: Sandeep Kunusoth <sandeepkunsoth000@gmail.com>
Signed-off-by: Sandeep Kunusoth <sandeepkunsoth000@gmail.com>
@sandeepkunusoth sandeepkunusoth marked this pull request as ready for review December 19, 2025 08:46
@sandeepkunusoth sandeepkunusoth changed the title (feat): Emit conditions via ValkeyClusterStatus (feat): Status Conditions for ValkeyCluster Dec 19, 2025
Signed-off-by: Sandeep Kunusoth <sandeepkunsoth000@gmail.com>
Comment thread test/e2e/e2e_test.go
Copy link
Copy Markdown
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial minor comments/questions

Comment thread internal/controller/status.go
Comment thread api/v1alpha1/valkeycluster_types.go Outdated
Comment thread hack/load-images-kind.sh Outdated
Comment thread internal/controller/status_test.go
Copy link
Copy Markdown
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice documentation!

Comment thread test/e2e/e2e_test.go Outdated
Comment thread test/e2e/e2e_test.go Outdated
Comment thread test/e2e/e2e_test.go Outdated
Comment thread test/e2e/e2e_test.go Outdated
Comment thread test/e2e/e2e_test.go
Comment thread test/e2e/e2e_test.go Outdated
Comment thread docs/status-conditions.md Outdated
Comment thread internal/controller/valkeycluster_controller.go Outdated
Copy link
Copy Markdown
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a last question about the return after setting slots not assigned status

Comment thread internal/controller/valkeycluster_controller.go Outdated
Comment thread test/e2e/e2e_test.go
Comment thread api/v1alpha1/valkeycluster_types.go
Comment thread docs/status-conditions.md Outdated
Copy link
Copy Markdown
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found two comments but then I'm happy.

Comment thread api/v1alpha1/valkeycluster_types.go Outdated
Comment thread api/v1alpha1/valkeycluster_types.go Outdated
Signed-off-by: Sandeep Kunusoth <sandeepkunsoth000@gmail.com>
Copy link
Copy Markdown
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sandeepkunusoth sandeepkunusoth merged commit be52c44 into valkey-io:main Dec 25, 2025
4 checks passed
hieu2102 pushed a commit to hieu2102/valkey-operator that referenced this pull request Jan 7, 2026
This PR enhances ValkeyCluster status (valkey-io#26) by updating status metadata
and status.conditions on every reconciliation. It also updates E2E tests
to cover: (Success) provisioning a ValkeyCluster and asserting it
reaches Ready with the expected condition, and (Negative) simulating
shard loss by deleting a shard deployment to force Degraded state and
verifying the Degraded/Ready conditions are set as expected.

---------

Signed-off-by: Sandeep Kunusoth <sandeepkunsoth000@gmail.com>
Signed-off-by: hieu2102 <hieund2102@gmail.com>
@sandeepkunusoth sandeepkunusoth deleted the add-valkeyClusterStatus-conditions branch January 11, 2026 06:00
bjosv added a commit to Nordix/valkey-operator that referenced this pull request May 10, 2026
updateStatus fetched the latest cluster object for change detection but
then called Status().Update() on the stale in-memory object from the
start of Reconcile. When the ValkeyNode controller updates node status
concurrently, the cluster's resourceVersion advances, causing the stale
update to fail with "the object has been modified".

These conflicts cascade: each failed status update requeues the
reconcile, delaying cluster formation and causing transient connectivity
errors (i/o timeout, connection refused) as pods get recreated.

Fix by applying the desired conditions to the freshly-fetched object and
updating that instead. Sync the result back to the caller so subsequent
logic in the same reconcile sees the current resourceVersion.

Bug introduced in be52c44 (Status Conditions for ValkeyCluster valkey-io#33),
became a problem after 89eb055 (ValkeyNode integration) added a second
controller that concurrently modifies cluster state.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
bjosv added a commit to Nordix/valkey-operator that referenced this pull request May 10, 2026
updateStatus fetched the latest cluster object for change detection but
then called Status().Update() on the stale in-memory object from the
start of Reconcile. When the ValkeyNode controller updates node status
concurrently, the cluster's resourceVersion advances, causing the stale
update to fail with "the object has been modified".

These conflicts cascade: each failed status update requeues the
reconcile, delaying cluster formation and causing transient connectivity
errors (i/o timeout, connection refused) as pods get recreated.

Fix by applying the desired conditions to the freshly-fetched object and
updating that instead. Sync the result back to the caller so subsequent
logic in the same reconcile sees the current resourceVersion.

Bug introduced in be52c44 (Status Conditions for ValkeyCluster valkey-io#33),
became a problem after 89eb055 (ValkeyNode integration) added a second
controller that concurrently modifies cluster state.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
bjosv added a commit to Nordix/valkey-operator that referenced this pull request May 10, 2026
updateStatus fetched the latest cluster object for change detection but
then called Status().Update() on the stale in-memory object from the
start of Reconcile. When the ValkeyNode controller updates node status
concurrently, the cluster's resourceVersion advances, causing the stale
update to fail with "the object has been modified".

These conflicts cascade: each failed status update requeues the
reconcile, delaying cluster formation and causing transient connectivity
errors (i/o timeout, connection refused) as pods get recreated.

Fix by applying the desired conditions to the freshly-fetched object and
updating that instead. Sync the result back to the caller so subsequent
logic in the same reconcile sees the current resourceVersion.

Bug introduced in be52c44 (Status Conditions for ValkeyCluster valkey-io#33),
became a problem after 89eb055 (ValkeyNode integration) added a second
controller that concurrently modifies cluster state.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants