fix: deterministic rollout order for multiple nodes per NodeType#15
fix: deterministic rollout order for multiple nodes per NodeType#15aruraghuwanshi wants to merge 2 commits into
Conversation
Sort node specs by map key within each NodeType in getNodeSpecsByOrder so rollingDeploy does not flap on Go map iteration. Add unit tests (prove non-determinism pre-fix) and an E2E check for two historical tiers (historicalstier1/2).
|
@aruraghuwanshi I believe the failing
I’d recommend:
I tested this locally and the revised flow behaves as expected: the rollout is triggered, tier1 updates first, tier2 waits, and the test completes successfully without hitting the |
|
@AdheipSingh can you take a look at this PR? |
workloadAnnotations only touch StatefulSet object metadata, not the pod template, so updateRevision never changes and the test times out at 900s. Switch to podAnnotations (which flow into PodTemplateSpec), add trap-based cleanup, and fail fast if tier1 never picks up a new revision.
|
Thanks @razinbouzar for the insights. Does seem to be the core issue. I've pushed another commit fixing that. Lets see. |
|
@abhishekrb19 or @AdheipSingh can you kick off the test workflow and review? |
Note: The following is review is purely written by me and not edited or reviewed or taken any inspiration by any AI.Problem StatementThe issue raised is valid, as the operator reconciles multiple druid nodes based on purely nodeType. The underlying structure is The current implementation raised in this PR, is basically a workaround where you end up sorting nodes on based upon the map[string]nodeSpec, just on string. The Go's sort.Slice with string comparison (specs[i].key < specs[j].key) uses lexicographic (dictionary) ordering. This isn't solving the core problem, that the operator isn't aware of the upgrade order and tiers. Solving this within the code isn't the best way until we abstract and built the awareness. Also tbh druid CR's are large and its not a very friendly way that we maintain the upgrade order based on order in which the spec is written. I should be able to define my order and write my spec in anyway. I would propose for an implementation which fixes the problem at its core, and sticks to druid native design on how it represents nodeTypes and tiers. ApproachIf i break down the main abstractions, we are dealing with 4 constructs. ( nodeType is present as of now, i plan to introduce order of upgrade, tiers and order of upgrade of tiers ).
As a user i should be able to define the following for my druid nodes.
TiersIn druid we can categorize historicals as well as brokers into tiers. A tier which defines separate groups of Historicals and Brokers to receive different query assignments / loading rules etc. How does this map in the operator spec ? We introduce tier as a key within the nodeSpec, scoped same with nodeType. OrderOfUpgradeRemove the hardcoded order in the code, though the order is based on druid's recommendation, a lot of times i had a custom order of upgrade. The main for loop should construct an order based on this. This is a []string structure and on each reconcile should be built. OrderOfUpgradeOfTiersWIthin a nodeType we need to have the ability to define order of upgrade of nodeTypes. So that needs to be defined as a separate structure for each tier OrderOfUpgradeOfTiers: Here's a combined spec. Upgrade execution:
ImplementationSignature remains as it is The existing service group should be extended to use : An ideal service group after construction should look like this: This way we solve the problem at the core. I have half of the implementation already done, and would like to raise a PR and get feedback. Also this is needs to be backward compatible, so we make sure regardless of user specifying the above we fallback to the default/current way. |
|
Thanks for putting this up @AdheipSingh ; makes sense to have the structure you're proposing. Looking forward to your PR. |
|
closing this, PR raised |
Summary
getNodeSpecsByOrdergroups node specs byNodeTypebut previously appended them fromfor key, nodeSpec := range m.Spec.Nodes. In Go, map iteration order is not stable, so the relative order of multiple specs with the sameNodeTypecould change between reconciles.With
rollingDeploy: true, the handler walks that ordered list and may return early while a workload is still rolling. If the intra–NodeTypeorder flips between calls, the operator can effectively start or advance rollouts for more than one StatefulSet/Deployment of the sameNodeTypeat a time, instead of finishing one before the next.This PR sorts specs by their map key (ascending) within each
NodeTypebefore building the final list, while keeping the existing cross–NodeTypeorder fromdruidServicesOrderunchanged.What changed
getNodeSpecsByOrder(controllers/druid/ordering.go):sort.Sliceon each per–NodeTypeslice byServiceGroup.key.controllers/druid/ordering_test.go): Ginkgo test data uses multiple historical tiers (historicalstier1–3); addedtesting.Ttests that would fail on the pre-fix map-only ordering and pass with stable sorting, plus a guard for cross–NodeTypeorder.e2e/configs/druid-rolling-deploy-cr.yaml,e2e/test-rolling-deploy-ordering.sh, wired frome2e/e2e.sh): two historical tiers (historicalstier1/historicalstier2) withrollingDeploy: true, patch to trigger a rollout, and checks that only one of the two historical StatefulSets is mid-update at a time, with lexicographically first tier finishing before the second starts (when transitions are observable at the poll interval).Testing
rollingDeploywith multiple nodes perNodeType).getNodeSpecsByOrderdocument why sorting is required (kept short).Release note (suggested)
Druid Operator: When
rollingDeployis enabled, rollout order for multiple StatefulSets/Deployments that share the sameNodeType(e.g.historicalstier1andhistoricalstier2) is now stable (sorted by node spec key). That avoids concurrent rollouts within the sameNodeTypecaused by non-deterministic map iteration.This is especially helpful if these two teirs are holding segment replicas across (1 in each tier). Both historicals getting rolled out causes the Druid cluster to have partial unavailability today.
Key changed/added files
controllers/druid/ordering.go— sort within eachNodeTypeby spec keycontrollers/druid/ordering_test.go— Ginkgo +testing.Tcoveragecontrollers/druid/testdata/ordering.yaml— fixture with multiple historical tierse2e/configs/druid-rolling-deploy-cr.yaml— rolling-deploy E2E CRe2e/test-rolling-deploy-ordering.sh— E2E scripte2e/e2e.sh— invoke the new E2E testFixes #XXXX.