Fix slot-range parsing during migration and add rebalance planning logic#92
Conversation
34325d0 to
78cb392
Compare
ede2f24 to
e728115
Compare
170677e to
ea75281
Compare
jdheyburn
left a comment
There was a problem hiding this comment.
Looks good, though I know nothing about rebalancing. I feel having clearer variable names would make it easier for a non-Cluster person like me to understand it more, which is where my comments are around.
If there are opportunities to extract a block of code into a function behind a descriptive name, that would be great too!
6621f41 to
8cb1309
Compare
bjosv
left a comment
There was a problem hiding this comment.
Looks good, some comments.
Feels like the "Tolerate ±1-slot rounding differences" is a bit tricky.
|
|
||
| // BuildRebalanceMove computes a single, deterministic slot move to improve balance. | ||
| // It returns nil when the cluster is already balanced or not ready for rebalancing. | ||
| func BuildRebalanceMove(shards []*ShardState, expectedShards int, maxSlots int) (*SlotMove, error) { |
There was a problem hiding this comment.
We build a SlotMove in this function, is the name of this function misguiding?
Maybe its a PlanRebalanceMove?
There was a problem hiding this comment.
also, the shards []*ShardState, is that from the current ClusterState, or is it planned to be a subset?
There was a problem hiding this comment.
also, the
shards []*ShardState, is that from the current ClusterState, or is it planned to be a subset?
It's actually a superset of current ClusterState, which will become apparent in a follow-up PR.
| type SlotMove struct { | ||
| Src *NodeState | ||
| Dst *NodeState | ||
| Slots []int |
There was a problem hiding this comment.
Not for this PR; but maybe we should have a defined type for a Slot in clusterstate.go (or alias).
That might avoid any confusion if an int means a slot, a number of slots, a length or size.
724ec1f to
665422a
Compare
bjosv
left a comment
There was a problem hiding this comment.
There are improvement possibilities, but I think we can address them in coming PRs.
760cab6 to
8766ab2
Compare
Two independent but related changes for scale-out support: 1. Skip migrating/importing entries (e.g. "[5461->-abc123]") in parseSlotsRanges to prevent parse errors when CLUSTER NODES output contains in-progress slot migrations. Adds unit tests. 2. Introduce PlanRebalanceMove which computes a single, deterministic slot move to incrementally rebalance a cluster after scale-out. Calculates per-primary slot targets (16384 / shards), identifies the most-loaded source and least-loaded destination, and returns a bounded SlotMove. Returns one move per call so each reconcile loop stays fast and restartable. Includes unit tests. Signed-off-by: yang.qiu <yang.qiu@reddit.com>
8766ab2 to
e094421
Compare
… logic - Rename rebalance.go → cluster_rebalance.go (and test file) - Rename BuildRebalanceMove → PlanRebalanceMove - Improve variable names for clarity (primarySlots fields, loop vars) - Move ±1 tolerance check into slot allocation loop to avoid masking real imbalances - Remove unused error return from takeSlotsFromRanges - Add remainder distribution comment in assignSlotsToPendingPrimaries Signed-off-by: yang.qiu <yang.qiu@reddit.com>
e094421 to
4d3314b
Compare
…gic (valkey-io#92) - Fix `parseSlotsRanges` to skip `[slot->-nodeid]` / `[slot-<-nodeid]` entries that Valkey appends to `CLUSTER NODES` output during active slot migrations, which previously caused parse errors. - Add `BuildRebalanceMove` — a deterministic, incremental slot rebalance planner that computes a single slot migration from an overloaded primary to an underloaded (or empty) one. This is the planning layer for scale-out support; execution is wired in a follow-up PR. Signed-off-by: yang.qiu <yang.qiu@reddit.com> Co-authored-by: yang.qiu <yang.qiu@reddit.com>
Summary
parseSlotsRangesto skip[slot->-nodeid]/[slot-<-nodeid]entries that Valkey appends toCLUSTER NODESoutput during active slot migrations, which previously caused parse errors.BuildRebalanceMove— a deterministic, incremental slot rebalance planner that computes a single slot migration from an overloaded primary to an underloaded (or empty) one. This is the planning layer for scale-out support; execution is wired in a follow-up PR.Test plan
parseSlotsRanges: normal ranges, migrating/importing entries, empty input, migration-only inputPlanRebalanceMove: scale-out (2→3 shards), balanced cluster (no-op), mismatched shard count, extra empty shards, zero max-slots, slot extraction from ranges