Skip to content

Add MiddleManager drain rollout strategy#18

Open
aruraghuwanshi wants to merge 1 commit into
apache:masterfrom
aruraghuwanshi:feature/middle-manager-drain-strategy
Open

Add MiddleManager drain rollout strategy#18
aruraghuwanshi wants to merge 1 commit into
apache:masterfrom
aruraghuwanshi:feature/middle-manager-drain-strategy

Conversation

@aruraghuwanshi
Copy link
Copy Markdown

Summary

This PR adds an opt-in MiddleManager drain rollout strategy for Druid clusters.

When spec.middleManagerDrainStrategy is set, the operator drains each MiddleManager StatefulSet pod before allowing Kubernetes to replace it during a rolling update. This reduces the chance of interrupting active Kafka/Kinesis ingestion tasks during MiddleManager upgrades.

What Changed

  • Added spec.middleManagerDrainStrategy as a pointer-presence CRD option:
    • omitted: standard StatefulSet rolling behavior
    • present: operator-managed MiddleManager drain rollout
  • Added status.middleManagerDrain to expose rollout progress and support recovery after operator restarts.
  • Implemented a drain state machine that:
    • blocks automatic StatefulSet rollout with partition control
    • selects one outdated MiddleManager pod at a time
    • disables the worker via the Druid Overlord API
    • discovers running tasks through Druid SQL
    • triggers supervisor task-group handoff
    • waits for ingestion tasks to drain
    • releases one pod for replacement
    • waits for the replacement pod to be ready on the new revision
    • re-enables the worker
  • Added configurable timeouts:
    • drainTimeout, default 1h
    • podReadyTimeout, default 30m
  • Reused existing upstream Router discovery and spec.auth Druid API credential plumbing.
  • Added URL path escaping and worker hostname validation around Druid API/SQL calls.
  • Documented the feature in docs/features.md and docs/druid_cr.md.
  • Regenerated CRDs, deepcopy code, and API docs.

Example

spec:
  rollingDeploy: true
  middleManagerDrainStrategy:
    drainTimeout: 1h
    podReadyTimeout: 30m

Introduce an opt-in CRD strategy that drains MiddleManager StatefulSet pods before rolling them, with status visibility, timeout handling, and upstream Druid API auth plumbing.
@AdheipSingh
Copy link
Copy Markdown
Member

Instead of using RollingUpdate with partition manipulation, why not just switch to MiddleManager StatefulSet to OnDelete update strategy. With OnDelete, Kubernetes never auto-replaces pods, the operator explicitly deletes each pod when ready.

1. Operator detects CurrentRevision != UpdateRevision
 2. Pick highest-ordinal outdated pod (mm-2)
 3. Disable worker + trigger handoff + wait for drain
 4. Delete the pod:  sdk.Delete(ctx, &pod)
 5. StatefulSet controller auto-recreates mm-2 with new revision
 6. Wait for new mm-2 to be Ready
 7. Enable worker
 8. Repeat for mm-1, mm-0
 9. Done

What are your thoughts ? Just thinking from a perspective on lowering the hops in state machine. ( also reduce API calls to k8s API ).

@AdheipSingh
Copy link
Copy Markdown
Member

Also thinking on simpler terms, shouldn't the MM process support graceful termination ?
on the k8s we can just increase the terminationGracePeriods ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants