Skip to content

fix: allow slot assignment for single-node clusters (#131)#135

Merged
jdheyburn merged 1 commit into
valkey-io:mainfrom
daanvinken:fix/single-shard-slot-assignment
Apr 22, 2026
Merged

fix: allow slot assignment for single-node clusters (#131)#135
jdheyburn merged 1 commit into
valkey-io:mainfrom
daanvinken:fix/single-shard-slot-assignment

Conversation

@daanvinken
Copy link
Copy Markdown
Contributor

This PR closes #131

Summary

A ValkeyCluster with 1 shard and 0 replicas gets stuck in an infinite reconciliation loop because assignSlotsToPendingPrimaries() skips isolated nodes, and a single-node cluster is permanently isolated with no peer to MEET.
This causes an infinite reconciliation loop

2026-04-09T04:22:11Z    DEBUG   reconcile...    {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"no-exporter","namespace":"default"}, "namespace": "default", "name": "no-exporter", "reconcileID": "21da6899-0e2a-44a3-a233-d0c40bad9f29"}
2026-04-09T04:22:11Z    DEBUG   internal ACLs unchanged {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"no-exporter","namespace":"default"}, "namespace": "default", "name": "no-exporter", "reconcileID": "21da6899-0e2a-44a3-a233-d0c40bad9f29"}
2026-04-09T04:22:11Z    DEBUG   slots are not assigned, requeue..       {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"no-exporter","namespace":"default"}, "namespace": "default", "name": "no-exporter", "reconcileID": "21da6899-0e2a-44a3-a233-d0c40bad9f29", "unassignedSlots": [{"Start":0,"End":16383}]}
2026-04-09T04:22:13Z    DEBUG   reconcile...    {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"no-exporter","namespace":"default"}, "namespace": "default", "name": "no-exporter", "reconcileID": "25ddac3d-2eb1-4032-ae07-8a7767e7a842"}
2026-04-09T04:22:13Z    DEBUG   internal ACLs unchanged {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"no-exporter","namespace":"default"}, "namespace": "default", "name": "no-exporter", "reconcileID": "25ddac3d-2eb1-4032-ae07-8a7767e7a842"}
2026-04-09T04:22:13Z    DEBUG   slots are not assigned, requeue..       {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"no-exporter","namespace":"default"}, "namespace": "default", "name": "no-exporter", "reconcileID": "25ddac3d-2eb1-4032-ae07-8a7767e7a842", "unassignedSlots": [{"Start":0,"End":16383}]}
2026-04-09T04:22:15Z    DEBUG   reconcile...    {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"no-exporter","namespace":"default"}, "namespace": "default", "name": "no-exporter", "reconcileID": "9baa0b13-b2ea-4494-a697-3610196f39de"}
<repeating ad infinitum>

It seems there's a deadlock between meetIsolatedNodes() and assignSlotsToPendingPrimaries()

When a single node starts up with cluster_known_nodes = 1, it's considered "isolated".

  1. meetIsolatedNodes() picks a meet target via findMeetTarget(). Since there are no non-isolated nodes, it falls back to isolated[0] aka the only
    node. Then because meetTarget == isolated[0], the node is removed from the list. The loop iterates over an empty slice, where no MEET is issued at any point in time.

  2. assignSlotsToPendingPrimaries() skips isolated nodes (cluster_known_nodes <= 1). Since no MEET happened, cluster_known_nodes is still 1, the
    node is skipped and no slots are assigned.

  3. The reconciler sees unassigned slots and requeues forever.

IIUC the isolation guard exists to prevent a pod in a multi-node cluster from getting slots before it's been introduced to its peers. But in a single-node cluster (1 shard, 0 replicas), isolation is the permanent correct state, there's no other node to MEET.

Implementation

I considered fixing this in meetIsolatedNodes() itself, but that's a dead end.Even if we made the single node attempt to MEET itself, CLUSTER MEET with your own address is a no-op so that wouldn't help.

The simpler fix is in assignSlotsToPendingPrimaries(). If the whole cluster is one node, isolation is the correct permanent state and there's nothing to protect against.

Testing

  • E2e test added for single-shard zero-replica cluster creation, validating the cluster reaches Ready with all 16384 slots assigned
  • Tested locally (with an E2E as well), reloaded a new image with the fix while having reproduced the initial bug and the single node cluster came up:
 ༼ つ ▀_▀ ༽つ  ~  src  valkey-ope  k get valkeycluster                                                                                                                                       fix/single-shard-slot-assignment  3✎  2+  ⎈ kind-valkey-repro  22:19:07
NAME          STATE         REASON        AGE
single-node   Reconciling   Reconciling   10m
 ༼ つ ▀_▀ ༽つ  ~  src  valkey-ope  k get valkeycluster                                                                                                                                       fix/single-shard-slot-assignment  3✎  2+  ⎈ kind-valkey-repro  22:19:30
NAME          STATE   REASON           AGE
single-node   Ready   ClusterHealthy   10m

@jdheyburn
Copy link
Copy Markdown
Collaborator

jdheyburn commented Apr 14, 2026

Just a thought, would users want to have 1 shard (primary) with 1 replica, and expand out from there? What would we need to change in the code to support this case too, alongside 1 shard with 0 replica?

@daanvinken
Copy link
Copy Markdown
Contributor Author

In that case there would be 2 nodes so that already works 👍

bjosv
bjosv previously approved these changes Apr 17, 2026
Copy link
Copy Markdown
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding!

Comment thread test/e2e/valkeycluster_test.go
A ValkeyCluster with 1 shard and 0 replicas gets stuck in an infinite
reconciliation loop because assignSlotsToPendingPrimaries() skips
isolated nodes, and a single-node cluster is permanently isolated
with no peer to MEET.

Skip the isolation guard when the expected cluster is a single node.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
@daanvinken daanvinken force-pushed the fix/single-shard-slot-assignment branch from ea405f7 to 5c8e308 Compare April 20, 2026 08:08
@jdheyburn
Copy link
Copy Markdown
Collaborator

@daanvinken I did some testing locally, and I went from 1 shard 0 replicas to 1 shard 1 replica, and the replica was not able to join.

2026-04-22T08:55:55Z    DEBUG   events  Shard has 1 of 2 nodes  {"type": "Normal", "object": "nil", "action": "CheckReplicas", "reason": "WaitingForReplicas"}
2026-04-22T08:55:56Z    DEBUG   reconciling ValkeyNode  {"controller": "valkeynode", "controllerGroup": "valkey.io", "controllerKind": "ValkeyNode", "ValkeyNode": {"name":"cluster-sample-0-0","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample-0-0", "reconcileID": "d10901ca-b123-414c-b1c3-4cb1acf4ec4b"}
2026-04-22T08:55:56Z    DEBUG   getting internal secret {"controller": "valkeynode", "controllerGroup": "valkey.io", "controllerKind": "ValkeyNode", "ValkeyNode": {"name":"cluster-sample-0-0","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample-0-0", "reconcileID": "d10901ca-b123-414c-b1c3-4cb1acf4ec4b", "node-labels": {"app.kubernetes.io/component":"valkey-node","app.kubernetes.io/instance":"cluster-sample-0-0","app.kubernetes.io/managed-by":"valkey-operator","app.kubernetes.io/name":"valkey","app.kubernetes.io/part-of":"valkey","valkey.io/cluster":"cluster-sample","valkey.io/node-index":"0","valkey.io/shard-index":"0"}}
2026-04-22T08:55:56Z    DEBUG   reconciled StatefulSet  {"controller": "valkeynode", "controllerGroup": "valkey.io", "controllerKind": "ValkeyNode", "ValkeyNode": {"name":"cluster-sample-0-0","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample-0-0", "reconcileID": "d10901ca-b123-414c-b1c3-4cb1acf4ec4b", "result": "updated", "name": "valkey-cluster-sample-0-0"}
2026-04-22T08:55:56Z    DEBUG   ValkeyNode reconciliation complete      {"controller": "valkeynode", "controllerGroup": "valkey.io", "controllerKind": "ValkeyNode", "ValkeyNode": {"name":"cluster-sample-0-0","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample-0-0", "reconcileID": "d10901ca-b123-414c-b1c3-4cb1acf4ec4b"}
2026-04-22T08:55:57Z    DEBUG   reconcile...    {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "adfeab31-49c8-4d77-8fab-2116d6e682d8"}
2026-04-22T08:55:57Z    INFO    getting system users secret: cluster-sample     {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "adfeab31-49c8-4d77-8fab-2116d6e682d8"}
2026-04-22T08:55:57Z    DEBUG   internal ACLs unchanged {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "adfeab31-49c8-4d77-8fab-2116d6e682d8"}
2026-04-22T08:55:57Z    DEBUG   add a new replica       {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "adfeab31-49c8-4d77-8fab-2116d6e682d8", "primary IP": "10.244.0.28", "primary Id": "f87d4b87c3b1ec18f8e67a695b9adf36f4bfc708", "replica address": "10.244.0.29", "shardIndex": 0}
2026-04-22T08:55:57Z    DEBUG   replica does not yet know primary (gossip pending); will retry  {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "adfeab31-49c8-4d77-8fab-2116d6e682d8", "replica": "10.244.0.29", "primaryId": "f87d4b87c3b1ec18f8e67a695b9adf36f4bfc708"}
2026-04-22T08:55:57Z    DEBUG   skipping replica; primary not ready yet {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "adfeab31-49c8-4d77-8fab-2116d6e682d8", "node": "10.244.0.29", "shard": 0}
2026-04-22T08:55:57Z    DEBUG   missing replicas, requeue..     {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "adfeab31-49c8-4d77-8fab-2116d6e682d8"}
2026-04-22T08:55:57Z    DEBUG   events  Shard has 1 of 2 nodes  {"type": "Normal", "object": "nil", "action": "CheckReplicas", "reason": "WaitingForReplicas"}

Also I went from 1 shard 0 replicas to 2 shards 0 replicas, and the second shard was not able to join.

026-04-22T09:00:13Z    DEBUG   events  Waiting for 10.244.0.30 to learn node 10.244.0.31       {"type": "Normal", "object": "nil", "action": "RebalanceSlots", "reason": "SlotsRebalancePending"}
2026-04-22T09:00:15Z    DEBUG   reconcile...    {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "2b218d55-32e0-4d09-b27a-2b2c9482cd00"}
2026-04-22T09:00:15Z    INFO    getting system users secret: cluster-sample     {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "2b218d55-32e0-4d09-b27a-2b2c9482cd00"}
2026-04-22T09:00:15Z    DEBUG   internal ACLs unchanged {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "2b218d55-32e0-4d09-b27a-2b2c9482cd00"}
2026-04-22T09:00:15Z    DEBUG   destination not yet visible to source via gossip; will retry    {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"cluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "cluster-sample", "reconcileID": "2b218d55-32e0-4d09-b27a-2b2c9482cd00", "src": "10.244.0.30", "dst": "10.244.0.31", "dstId": "6513f34b99f908c4dfcb4b43741bc7c729f61c80"}
2026-04-22T09:00:15Z    DEBUG   events  Waiting for 10.244.0.30 to learn node 10.244.0.31       {"type": "Normal", "object": "nil", "action": "RebalanceSlots", "reason": "SlotsRebalancePending"}

If these use cases are not yet supported, that's fine - maybe we can raise Issues for these to be able to support them in future?

@daanvinken
Copy link
Copy Markdown
Contributor Author

Thanks for testing! Seemingly these are pre-existing scale-up issues. This PR only changes the initial slot assignment for a 1-shard-0-replica cluster.

Both cases seem to get stuck in the gossip-waiting loops (before reaching any code this PR touches):

These would likely reproduce when scaling any cluster configuration, not just single-node origins. I will open issue(s) for both after some investigation 👍

@daanvinken
Copy link
Copy Markdown
Contributor Author

I think we should add scaling E2E tests as well, I'll have a look.

@jdheyburn
Copy link
Copy Markdown
Collaborator

@daanvinken Are those E2E tests to be included in this PR?

@daanvinken
Copy link
Copy Markdown
Contributor Author

The fix was fairly trivial, I added it here: #147

E2E for all cases of this PR and the cases you mentioned are in #148

Copy link
Copy Markdown
Collaborator

@jdheyburn jdheyburn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jdheyburn jdheyburn merged commit 815b166 into valkey-io:main Apr 22, 2026
7 checks passed
jdheyburn pushed a commit that referenced this pull request Apr 24, 2026
**Description:**

Scaling a single-node cluster (1 shard, 0 replicas) to add a replica
gets stuck in an infinite reconciliation loop. `findMeetTarget()` skips
shard primaries with `cluster_known_nodes <= 1`, so the new replica is
never `CLUSTER MEET`'d to the primary. `CLUSTER REPLICATE` fails with
"Unknown node" on every reconciliaton.

The same bug also prevents adding a new shard (1 shard -> 2 shards),
since the new shard's primary can't MEET the existing isolated primary
either.

A shard primary that owns slots is always a valid `MEET` target
regardless of `cluster_known_nodes`. This removes the `IsIsolated()`
guard for shard primaries.

Discovered while investigating feedback on #135.

**Testing:**
Reproduced both cases on a local kind cluster: create a
1-shard-0-replica cluster, wait for Ready, then scale up.

**Case 1: add replica** (`replicas: 0` -> `replicas: 1`)

Before fix - cluster stuck at Reconciling, replica never `MEET`'d:
```
$ kubectl get valkeycluster
NAME         STATE         REASON
test-scale   Reconciling   Reconciling

$ kubectl get valkeynodes -o custom-columns=NAME:.metadata.name,READY:.status.ready,ROLE:.status.role
NAME             READY   ROLE
test-scale-0-0   true    primary
test-scale-0-1   true    primary

$ kubectl exec statefulset/valkey-test-scale-0-1 -c server -- valkey-cli CLUSTER INFO | grep cluster_known_nodes
cluster_known_nodes:1
```

Operator logs repeating every 2s:
```
DEBUG  replica does not yet know primary (gossip pending); will retry  replica=10.244.0.9 primaryId=3441aa9a...
DEBUG  skipping replica; primary not ready yet
DEBUG  missing replicas, requeue..
```

After fix - MEET succeeds, cluster reaches Ready:
```
$ kubectl get valkeycluster
NAME         STATE   REASON
test-scale   Ready   ClusterHealthy

$ kubectl get valkeynodes -o custom-columns=NAME:.metadata.name,READY:.status.ready,ROLE:.status.role
NAME             READY   ROLE
test-scale-0-0   true    primary
test-scale-0-1   true    replica

$ kubectl exec statefulset/valkey-test-scale-0-0 -c server -- valkey-cli CLUSTER NODES
3441aa9a... 10.244.0.8:6379@16379 myself,master - 0 0 0 connected 0-16383
bd334739... 10.244.0.9:6379@16379 slave 3441aa9a... 0 1776858716401 0 connected
```

Operator logs:
```
DEBUG  meet node  node=10.244.0.9 target=10.244.0.8
DEBUG  events  Introduced 1 isolated node(s) to the cluster
```

**Case 2: add shard** (`shards: 1` -> `shards: 2`)

After fix - new shard joins and slots are rebalanced:
```
$ kubectl get valkeycluster
NAME          STATE   REASON
test-scale2   Ready   ClusterHealthy

$ kubectl get valkeynodes -o custom-columns=NAME:.metadata.name,READY:.status.ready,ROLE:.status.role
NAME              READY   ROLE
test-scale2-0-0   true    primary
test-scale2-1-0   true    primary

$ kubectl exec statefulset/valkey-test-scale2-0-0 -c server -- valkey-cli CLUSTER NODES
a6e1a811... 10.244.0.13:6379@16379 master - 0 1776860233046 1 connected 0-8191
e9dae06c... 10.244.0.12:6379@16379 myself,master - 0 0 0 connected 8192-16383
```

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
sandeepkunusoth pushed a commit that referenced this pull request Apr 29, 2026
## Description

Adds e2e coverage for scaling a 1-shard-0-replica cluster. This was
missing and led to the bugs found in #135 / #147. Two tests:
- Scale from 1 shard / 0 replicas to 1 shard / 1 replica (add replica)
- Scale from 1 shard / 0 replicas to 2 shards / 0 replicas (add shard)

Both verify the cluster reaches Ready with correct `cluster_known_nodes`
and `cluster_size`.

## Testing

Tests compile. Pattern follows the existing `rebalances slots on scale
out` e2e test. Validated locally by pointing context to kind cluster.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
sandeepkunusoth pushed a commit to sandeepkunusoth/valkey-k8s-operator that referenced this pull request May 5, 2026
…key-io#147)

**Description:**

Scaling a single-node cluster (1 shard, 0 replicas) to add a replica
gets stuck in an infinite reconciliation loop. `findMeetTarget()` skips
shard primaries with `cluster_known_nodes <= 1`, so the new replica is
never `CLUSTER MEET`'d to the primary. `CLUSTER REPLICATE` fails with
"Unknown node" on every reconciliaton.

The same bug also prevents adding a new shard (1 shard -> 2 shards),
since the new shard's primary can't MEET the existing isolated primary
either.

A shard primary that owns slots is always a valid `MEET` target
regardless of `cluster_known_nodes`. This removes the `IsIsolated()`
guard for shard primaries.

Discovered while investigating feedback on valkey-io#135.

**Testing:**
Reproduced both cases on a local kind cluster: create a
1-shard-0-replica cluster, wait for Ready, then scale up.

**Case 1: add replica** (`replicas: 0` -> `replicas: 1`)

Before fix - cluster stuck at Reconciling, replica never `MEET`'d:
```
$ kubectl get valkeycluster
NAME         STATE         REASON
test-scale   Reconciling   Reconciling

$ kubectl get valkeynodes -o custom-columns=NAME:.metadata.name,READY:.status.ready,ROLE:.status.role
NAME             READY   ROLE
test-scale-0-0   true    primary
test-scale-0-1   true    primary

$ kubectl exec statefulset/valkey-test-scale-0-1 -c server -- valkey-cli CLUSTER INFO | grep cluster_known_nodes
cluster_known_nodes:1
```

Operator logs repeating every 2s:
```
DEBUG  replica does not yet know primary (gossip pending); will retry  replica=10.244.0.9 primaryId=3441aa9a...
DEBUG  skipping replica; primary not ready yet
DEBUG  missing replicas, requeue..
```

After fix - MEET succeeds, cluster reaches Ready:
```
$ kubectl get valkeycluster
NAME         STATE   REASON
test-scale   Ready   ClusterHealthy

$ kubectl get valkeynodes -o custom-columns=NAME:.metadata.name,READY:.status.ready,ROLE:.status.role
NAME             READY   ROLE
test-scale-0-0   true    primary
test-scale-0-1   true    replica

$ kubectl exec statefulset/valkey-test-scale-0-0 -c server -- valkey-cli CLUSTER NODES
3441aa9a... 10.244.0.8:6379@16379 myself,master - 0 0 0 connected 0-16383
bd334739... 10.244.0.9:6379@16379 slave 3441aa9a... 0 1776858716401 0 connected
```

Operator logs:
```
DEBUG  meet node  node=10.244.0.9 target=10.244.0.8
DEBUG  events  Introduced 1 isolated node(s) to the cluster
```

**Case 2: add shard** (`shards: 1` -> `shards: 2`)

After fix - new shard joins and slots are rebalanced:
```
$ kubectl get valkeycluster
NAME          STATE   REASON
test-scale2   Ready   ClusterHealthy

$ kubectl get valkeynodes -o custom-columns=NAME:.metadata.name,READY:.status.ready,ROLE:.status.role
NAME              READY   ROLE
test-scale2-0-0   true    primary
test-scale2-1-0   true    primary

$ kubectl exec statefulset/valkey-test-scale2-0-0 -c server -- valkey-cli CLUSTER NODES
a6e1a811... 10.244.0.13:6379@16379 master - 0 1776860233046 1 connected 0-8191
e9dae06c... 10.244.0.12:6379@16379 myself,master - 0 0 0 connected 8192-16383
```

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
sandeepkunusoth pushed a commit to sandeepkunusoth/valkey-k8s-operator that referenced this pull request May 5, 2026
## Description

Adds e2e coverage for scaling a 1-shard-0-replica cluster. This was
missing and led to the bugs found in valkey-io#135 / valkey-io#147. Two tests:
- Scale from 1 shard / 0 replicas to 1 shard / 1 replica (add replica)
- Scale from 1 shard / 0 replicas to 2 shards / 0 replicas (add shard)

Both verify the cluster reaches Ready with correct `cluster_known_nodes`
and `cluster_size`.

## Testing

Tests compile. Pattern follows the existing `rebalances slots on scale
out` e2e test. Validated locally by pointing context to kind cluster.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operator is stuck at slots are not assigned, requeue.. if the cluster have 1 shard, 0 replica

3 participants