Feat/tier aware upgrade ordering#17
Conversation
Note: The following is review is purely written by me and not edited or reviewed or taken any inspiration by any AI.Problem StatementThe issue raised is valid, as the operator reconciles multiple druid nodes based on purely nodeType. The underlying structure is I would propose for an implementation which fixes the problem at its core, and sticks to druid native design on how it represents nodeTypes and tiers. ApproachIf i break down the main abstractions, we are dealing with 4 constructs. ( nodeType is present as of now, i plan to introduce order of upgrade, tiers and order of upgrade of tiers ).
As a user i should be able to define the following for my druid nodes.
TiersIn druid we can categorize historicals as well as brokers into tiers. A tier which defines separate groups of Historicals and Brokers to receive different query assignments / loading rules etc. How does this map in the operator spec ? We introduce tier as a key within the nodeSpec, scoped same with nodeType. OrderOfUpgradeRemove the hardcoded order in the code, though the order is based on druid's recommendation, a lot of times i had a custom order of upgrade. The main for loop should construct an order based on this. This is a []string structure and on each reconcile should be built. OrderOfUpgradeOfTiersWIthin a nodeType we need to have the ability to define order of upgrade of nodeTypes. So that needs to be defined as a separate structure for each tier OrderOfUpgradeOfTiers: Here's a combined spec. Upgrade execution:
ImplementationSignature remains as it is The existing service group should be extended to use : An ideal service group after construction should look like this: This way we solve the problem at the core. I have half of the implementation already done, and would like to raise a PR and get feedback. Also this is needs to be backward compatible, so we make sure regardless of user specifying the above we fallback to the default/current way. |
| return result | ||
| } | ||
|
|
||
| func sortServiceGroups(groups []*ServiceGroup, tierOrder []string) { |
There was a problem hiding this comment.
sortServiceGroups sorts nodes of the same node type to determine their rollout order. tierRank map will convert the tier order slice ["hot", "cold", "glacier"] into a lookup: {"hot": 0, "cold": 1, "glacier": 2}.
| func getNodeSpecsByOrder(m *v1alpha1.Druid) []*ServiceGroup { | ||
| nodeTypeOrder := defaultDruidServicesOrder | ||
| if len(m.Spec.OrderOfUpgrade) > 0 { | ||
| nodeTypeOrder = m.Spec.OrderOfUpgrade |
There was a problem hiding this comment.
Could we make this path a little more defensive?
If orderOfUpgrade is set, it looks like the reconciler only returns node groups whose nodeType appears in that list. For a partial list, typo, or unknown node type, configured nodes could be skipped entirely.
Since deployDruidCluster uses this returned list to populate the resource name maps used by cleanup, skipped node groups may later look unused and be deleted.
Maybe we could either validate orderOfUpgrade as a complete/valid list for the configured node types, or treat it as a priority list and append any remaining configured node types in the default deterministic order.
| return err == nil | ||
| }, timeout, interval).Should(BeTrue()) | ||
| }) | ||
| It("Should return nodes ordered by custom nodeType order and tier order", func() { |
There was a problem hiding this comment.
Could we add a small regression test around incomplete or invalid orderOfUpgrade inputs?
The happy path for tier ordering is covered, but the more risky behavior seems to be partial, unknown, or duplicate node type entries. A test showing that remaining configured nodes are still reconciled, or that the spec is rejected during validation, would make this safer to evolve.
| // OrderOfUpgrade defines the order in which node types are upgraded during a rolling deploy. | ||
| // If not specified, the default order is used: historical, overlord, middleManager, indexer, broker, coordinator, router. | ||
| // +optional | ||
| OrderOfUpgrade []string `json:"orderOfUpgrade,omitempty"` |
There was a problem hiding this comment.
Looks like the generated CRD manifests may still need to be included for these new API fields.
I see zz_generated.deepcopy.go was updated in the PR, but running make manifests generate locally still produced additional generated drift, especially in config/crd/bases/druid.apache.org_druids.yaml and chart/crds/druid.apache.org_druids.yaml.
Without those CRD updates, users installing from the manifests or Helm chart may not be able to persist orderOfUpgrade, orderOfUpgradeOfTiers, or tier as expected.
|
Thanks @aruraghuwanshi for the review. Will get back this week. |
Fixes #XXXX.
Description
This PR has:
Key changed/added files in this PR
MyFooOurBarTheirBaz