Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions examples/auto-mode/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Terraform applies two **`kubernetes_manifest`** resources after `module.eks` (se
| NodePool | Role |
| --- | --- |
| `gpu` | Accelerated / NVIDIA (**g5/g6/g6e**), on-demand, GPU taint for isolation. |
| `batch-spot` | Spot-first batch-style general compute with on-demand fallback; caps and disruption in YAML. |
| `spot` | Spot-first batch-style general compute with on-demand fallback; caps and disruption in YAML. |

Both set `nodeClassRef` to Auto Mode **`NodeClass`** **`default`**. Remove or edit the `.tf` / YAML files if you do not want these pools.

Expand All @@ -90,15 +90,15 @@ kubectl get nodes -w
# kubectl create deployment demo --image=nginx --replicas=5
```

You should see nodes appear for the **built-in** pools when pending pods need capacity. **GPU** and **batch** shapes appear when workloads match the **Karpenter `NodePool`** requirements and tolerations.
You should see nodes appear for the **built-in** pools when pending pods need capacity. **GPU** and **spot** NodePool shapes appear when workloads match the **Karpenter `NodePool`** requirements and tolerations.

## Optional: Headlamp OIDC and SAML

Headlamp in your GitOps repo consumes **Cognito** as OIDC issuer; this Terraform creates Cognito, wires **EKS `aws_eks_identity_provider_config`** (`username_claim = sub`, `groups_claim = cognito:groups`), and writes **`headlamp/oidc`** to Secrets Manager for the in-cluster sync.

**Typical SAML flow:**

1. **First apply** without `headlamp_saml_metadata_url`. Use IdP ACS / audience from:
1. **First apply** without `headlamp_saml_metadata_url` (and without legacy `headlamp_idc_saml_metadata_url`). Use IdP ACS / audience from:

```bash
terraform output -raw headlamp_cognito_saml_acs_url
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
resource "kubernetes_manifest" "karpenter_batch_spot_nodepool" {
manifest = yamldecode(file("${path.module}/manifests/karpenter-nodepool-batch-spot.yaml"))
resource "kubernetes_manifest" "karpenter_gpu_nodepool" {
manifest = yamldecode(file("${path.module}/manifests/karpenter-nodepool-gpu.yaml"))

depends_on = [module.eks]
}
5 changes: 5 additions & 0 deletions examples/auto-mode/karpenter-nodepool-spot.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
resource "kubernetes_manifest" "karpenter_spot_nodepool" {
manifest = yamldecode(file("${path.module}/manifests/karpenter-nodepool-spot.yaml"))

depends_on = [module.eks]
}
5 changes: 0 additions & 5 deletions examples/auto-mode/karpenter-nodepool.tf

This file was deleted.

7 changes: 0 additions & 7 deletions examples/auto-mode/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,6 @@ provider "aws" {
region = var.aws_region
}

# Required while state still has resources created in us-east-1 (e.g. ACM for CloudFront). After `terraform apply`
# removes them, this block can stay (harmless) or you may remove it if no resource uses `provider = aws.us_east_1`.
provider "aws" {
alias = "us_east_1"
region = "us-east-1"
}

data "aws_caller_identity" "current" {}

locals {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Karpenter NodePool: interruptible batch capacity on general compute (C + M categories), spot-first
# with on-demand fallback (separate from GPU accelerated pool).
# Applied by karpenter-nodepool-batch-spot.tf only — do not merge into karpenter-nodepool-accelerated.yaml.
# Applied by karpenter-nodepool-spot.tf only — do not merge into karpenter-nodepool-accelerated.yaml.
#
# Spot-first with on-demand fallback in ONE pool: requirements allow both capacity types; Karpenter
# prefers spot and falls back to on-demand when spot cannot satisfy provisioning (see Karpenter
Expand All @@ -11,7 +11,7 @@
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: batch-spot
name: spot
spec:
# Prefer reclaiming underutilized capacity when budgets allow.
disruption:
Expand Down Expand Up @@ -46,7 +46,7 @@ spec:
- c
- m
taints:
- key: devops.k8sforge/batch-spot
- key: devops.k8sforge/spot
effect: NoSchedule
# 336h == 14 days; aligns with EKS Auto Mode default node lifetime (AWS “Create a Node Pool” → Disruption).
expireAfter: 336h
Expand Down