fix: allow MEET against isolated shard primaries during scale-up#147
Conversation
| // isolated (fresh bootstrap, first reconcile). | ||
| func findMeetTarget(state *valkey.ClusterState, isolated []*valkey.NodeState) *valkey.NodeState { | ||
| for _, shard := range state.Shards { | ||
| if p := shard.GetPrimaryNode(); p != nil && !p.IsIsolated() { |
There was a problem hiding this comment.
Admittedly it feels a bit weird removing this check, but a shard primary (in state.Shards) always has slots assigned, meaning it went through the full slot assignment flow and is a legitimate cluster member.
The only scenario where such a primary has cluster_known_nodes <= 1 is a single-node cluster being scaled up. Network partitions don't cause this because cluster_known_nodes counts remembered peers even if they're in a failed state, and fresh nodes can't get slots while isolated thanks to the guard in assignSlotsToPendingPrimaries() here.
|
Thanks for raising this! I did a test locally, looks like there is a regression from the previous PR, where 1 shard with 0 replica doesn't produce a healthy cluster: Operator logs: When I added a replica or shard to it, it seemed to have fixed it. Do you have the same on your side? |
findMeetTarget() skipped shard primaries with cluster_known_nodes <= 1 (IsIsolated). When scaling a single-node cluster (1 shard, 0 replicas) to add a replica, the existing primary has cluster_known_nodes:1 and was skipped. findMeetTarget fell through to isolated[0] (the new replica), which the caller then trimmed from the iteration list, causing MEET to iterate an empty slice. The replica was never introduced to the primary, and CLUSTER REPLICATE failed with "Unknown node" on every reconcile. A shard primary that owns slots is always a valid MEET target regardless of cluster_known_nodes. Remove the IsIsolated guard for shard primaries. Signed-off-by: Daan Vinken <daanvinken@tythus.com>
9f33994 to
cd304d7
Compare
|
The 1s/0r creation issue you're seeing is #135, which has been merged. I've rebased this PR onto the latest main which includes it. The initial creation should work now, and this PR adds the scale-up fix on top. I tested it locally on this branch, if you are still able to reproduce please share the exact config 🙏 |
|
Ah yeah sorry, I assumed that #135 was included in this branch. I did a round of testing and it works now - thanks for looking into this! |
## Description Adds e2e coverage for scaling a 1-shard-0-replica cluster. This was missing and led to the bugs found in #135 / #147. Two tests: - Scale from 1 shard / 0 replicas to 1 shard / 1 replica (add replica) - Scale from 1 shard / 0 replicas to 2 shards / 0 replicas (add shard) Both verify the cluster reaches Ready with correct `cluster_known_nodes` and `cluster_size`. ## Testing Tests compile. Pattern follows the existing `rebalances slots on scale out` e2e test. Validated locally by pointing context to kind cluster. Signed-off-by: Daan Vinken <daanvinken@tythus.com>
…key-io#147) **Description:** Scaling a single-node cluster (1 shard, 0 replicas) to add a replica gets stuck in an infinite reconciliation loop. `findMeetTarget()` skips shard primaries with `cluster_known_nodes <= 1`, so the new replica is never `CLUSTER MEET`'d to the primary. `CLUSTER REPLICATE` fails with "Unknown node" on every reconciliaton. The same bug also prevents adding a new shard (1 shard -> 2 shards), since the new shard's primary can't MEET the existing isolated primary either. A shard primary that owns slots is always a valid `MEET` target regardless of `cluster_known_nodes`. This removes the `IsIsolated()` guard for shard primaries. Discovered while investigating feedback on valkey-io#135. **Testing:** Reproduced both cases on a local kind cluster: create a 1-shard-0-replica cluster, wait for Ready, then scale up. **Case 1: add replica** (`replicas: 0` -> `replicas: 1`) Before fix - cluster stuck at Reconciling, replica never `MEET`'d: ``` $ kubectl get valkeycluster NAME STATE REASON test-scale Reconciling Reconciling $ kubectl get valkeynodes -o custom-columns=NAME:.metadata.name,READY:.status.ready,ROLE:.status.role NAME READY ROLE test-scale-0-0 true primary test-scale-0-1 true primary $ kubectl exec statefulset/valkey-test-scale-0-1 -c server -- valkey-cli CLUSTER INFO | grep cluster_known_nodes cluster_known_nodes:1 ``` Operator logs repeating every 2s: ``` DEBUG replica does not yet know primary (gossip pending); will retry replica=10.244.0.9 primaryId=3441aa9a... DEBUG skipping replica; primary not ready yet DEBUG missing replicas, requeue.. ``` After fix - MEET succeeds, cluster reaches Ready: ``` $ kubectl get valkeycluster NAME STATE REASON test-scale Ready ClusterHealthy $ kubectl get valkeynodes -o custom-columns=NAME:.metadata.name,READY:.status.ready,ROLE:.status.role NAME READY ROLE test-scale-0-0 true primary test-scale-0-1 true replica $ kubectl exec statefulset/valkey-test-scale-0-0 -c server -- valkey-cli CLUSTER NODES 3441aa9a... 10.244.0.8:6379@16379 myself,master - 0 0 0 connected 0-16383 bd334739... 10.244.0.9:6379@16379 slave 3441aa9a... 0 1776858716401 0 connected ``` Operator logs: ``` DEBUG meet node node=10.244.0.9 target=10.244.0.8 DEBUG events Introduced 1 isolated node(s) to the cluster ``` **Case 2: add shard** (`shards: 1` -> `shards: 2`) After fix - new shard joins and slots are rebalanced: ``` $ kubectl get valkeycluster NAME STATE REASON test-scale2 Ready ClusterHealthy $ kubectl get valkeynodes -o custom-columns=NAME:.metadata.name,READY:.status.ready,ROLE:.status.role NAME READY ROLE test-scale2-0-0 true primary test-scale2-1-0 true primary $ kubectl exec statefulset/valkey-test-scale2-0-0 -c server -- valkey-cli CLUSTER NODES a6e1a811... 10.244.0.13:6379@16379 master - 0 1776860233046 1 connected 0-8191 e9dae06c... 10.244.0.12:6379@16379 myself,master - 0 0 0 connected 8192-16383 ``` Signed-off-by: Daan Vinken <daanvinken@tythus.com>
## Description Adds e2e coverage for scaling a 1-shard-0-replica cluster. This was missing and led to the bugs found in valkey-io#135 / valkey-io#147. Two tests: - Scale from 1 shard / 0 replicas to 1 shard / 1 replica (add replica) - Scale from 1 shard / 0 replicas to 2 shards / 0 replicas (add shard) Both verify the cluster reaches Ready with correct `cluster_known_nodes` and `cluster_size`. ## Testing Tests compile. Pattern follows the existing `rebalances slots on scale out` e2e test. Validated locally by pointing context to kind cluster. Signed-off-by: Daan Vinken <daanvinken@tythus.com>
Description:
Scaling a single-node cluster (1 shard, 0 replicas) to add a replica gets stuck in an infinite reconciliation loop.
findMeetTarget()skips shard primaries withcluster_known_nodes <= 1, so the new replica is neverCLUSTER MEET'd to the primary.CLUSTER REPLICATEfails with "Unknown node" on every reconciliaton.The same bug also prevents adding a new shard (1 shard -> 2 shards), since the new shard's primary can't MEET the existing isolated primary either.
A shard primary that owns slots is always a valid
MEETtarget regardless ofcluster_known_nodes. This removes theIsIsolated()guard for shard primaries.Discovered while investigating feedback on #135.
Testing:
Reproduced both cases on a local kind cluster: create a 1-shard-0-replica cluster, wait for Ready, then scale up.
Case 1: add replica (
replicas: 0->replicas: 1)Before fix - cluster stuck at Reconciling, replica never
MEET'd:Operator logs repeating every 2s:
After fix - MEET succeeds, cluster reaches Ready:
Operator logs:
Case 2: add shard (
shards: 1->shards: 2)After fix - new shard joins and slots are rebalanced: