Skip to content

Fix service discovery bug in kubernetes-extensions#19139

Draft
capistrant wants to merge 12 commits intoapache:masterfrom
capistrant:k8s-discovery-bug
Draft

Fix service discovery bug in kubernetes-extensions#19139
capistrant wants to merge 12 commits intoapache:masterfrom
capistrant:k8s-discovery-bug

Conversation

@capistrant
Copy link
Contributor

Description

Bug Report

The k8s service discovery is not removing discovered nodes whose pods still exist with service announcement labels, but the underlying services are actually unhealthy.

For example, if a broker container is killed but the pod that manages it remains in the namespace with announcement labels, all druid services will maintain this service in their discovered services cache. This leads to queries being routed to a broker that cannot possibly execute the request. If this pod remains in an announced but unhealthy state for any meaningful period of time, the cluster functionality can be severely compromised.

Desired behavior in the above example would be that the broker is removed from discovered services caches, at least until the underlying container for the pod is restarted and the pod is healthy again.

Fix Details

My proposed fix starts using a pods readiness flag in the discovery logic. If a pod is not ready, the underlying services will not be added to service discovery caches they are not in and will be removed from any caches that they were in. These services can be added back once they have a MODIFIED or ADDED event in addition to being ready again.

Fix Risks

The biggest risk I see is that this new reliance on readiness probe introduces an expectation that this probe is accurate and stable. I try to call out in documentation that this needs to be considered when defining the readiness probe for a pod as a way to mitigate unexpected changes for users. This could be included in a release note as well to tip off any users of the extension.

Release note

TBD


Key changed/added classes in this PR
  • DefaultK8sApiClient
  • BaseNodeRoleWatcher
  • WatchResult

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@capistrant
Copy link
Contributor Author

marking this as draft while I evaluate a competing approach that uses pod phase instead of readiness

@capistrant capistrant marked this pull request as draft March 11, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant