feat(k8s): add k8s-health-check reusable workflow#103
Open
chrisamti wants to merge 8 commits into
Open
Conversation
Validates key cluster components after a Terraform apply to gate prod deploys on dev health. Checks Karpenter, Datadog (operator, cluster-agent, node-agent) and Lacework rollout status, plus Datadog operator reconciliation error logs.
…v for test - Fail with a clear error if no cluster is found in the region instead of passing 'None' to aws eks update-kubeconfig - Switch test environment from sandbox (no cluster) to platform-dev
The workflow requires access to a specific EKS cluster in the platform-dev AWS account. This environment is not available in this repo and would require additional OIDC trust configuration. The workflow is effectively tested via the disco-infra-terraform pipeline which already has the correct access.
Adds a direct check on the DatadogAgent custom resource status conditions, catching Error/Degraded states (e.g. immutable field errors) at the source rather than relying solely on operator log parsing.
Contributor
|
@codex[agent] Please review |
Contributor
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
k8s-health-check.yamlthat validates key cluster components are healthy after a Terraform applytf-apply(dev) andtf-apply(prod)Error/Degradedconditions on the custom resource (catches immutable field errors, reconciliation failures, etc.)ERRORlevel entries from the last 3 minutesUsage
Testing
No integration test is included in this repo. This workflow is an integration test by nature — it runs
kubectl rollout statusand inspects live CRD status against real cluster resources. The sandbox account has no EKS cluster with the required components (Karpenter, Datadog), so it cannot be tested here.The workflow is tested end-to-end via the
disco-infra-terraformpipeline (see companion PR), which has access to the realplatform-devcluster.Test plan
disco-infra-terraformPR pipeline againstplatform-dev