Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.19.0
name: azurenodeclasses.karpenter.azure.com
spec:
group: karpenter.azure.com
names:
categories:
- karpenter
kind: AzureNodeClass
listKind: AzureNodeClassList
plural: azurenodeclasses
shortNames:
- aznc
- azncs
singular: azurenodeclass
scope: Cluster
versions:
- additionalPrinterColumns:
- jsonPath: .status.conditions[?(@.type=='Ready')].status
name: Ready
type: string
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
- jsonPath: .spec.imageID
name: ImageID
priority: 1
type: string
name: v1alpha1
schema:
openAPIV3Schema:
description: |-
AzureNodeClass is the Schema for the AzureNodeClass API.
AzureNodeClass is a more generic node class for provisioning Azure VMs
that are not necessarily managed by AKS. It supports custom images,
custom bootstrap data (userData), and per-NodeClass identity configuration.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: |-
spec is the top level specification for the Azure Karpenter Provider.
This will contain configuration necessary to launch instances in Azure.
properties:
imageID:
description: |-
imageID is the ARM resource ID of the image that instances use.
This can be a Compute Gallery image, Shared Image Gallery image, Community Gallery image,
or any valid Azure image resource ID.
When set, imageFamily-based image resolution is bypassed entirely.
The user is responsible for ensuring the image is compatible with the selected instance types.
Examples:
/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Compute/galleries/{gallery}/images/{image}/versions/{version}
/CommunityGalleries/{gallery}/images/{image}/versions/{version}
maxLength: 1024
pattern: (?i)^(\/subscriptions\/[^\/]+\/resourceGroups\/[^\/]+\/providers\/Microsoft\.Compute\/.*|\/CommunityGalleries\/[^\/]+\/images\/[^\/]+\/versions\/[^\/]+)$
type: string
managedIdentities:
description: |-
managedIdentities is a list of user-assigned managed identity resource IDs
to attach to provisioned VMs. These are merged with any global identities
configured via the --node-identities flag.
items:
type: string
maxItems: 10
type: array
osDiskSizeGB:
description: osDiskSizeGB is the size of the OS disk in GB.
format: int32
maximum: 4096
minimum: 30
type: integer
security:
description: security is a collection of security related karpenter
fields.
properties:
encryptionAtHost:
description: |-
encryptionAtHost specifies whether host-level encryption is enabled for provisioned nodes.
For more information, see:
https://learn.microsoft.com/en-us/azure/virtual-machines/disk-encryption#encryption-at-host---end-to-end-encryption-for-your-vm-data
type: boolean
type: object
tags:
additionalProperties:
type: string
description: tags to be applied on Azure resources like instances.
type: object
x-kubernetes-validations:
- message: tags keys must be less than 512 characters
rule: self.all(k, size(k) <= 512)
- message: tags keys must not contain '<', '>', '%', '&', or '?'
rule: self.all(k, !k.matches('[<>%&?]'))
- message: tags keys must not contain '\'
rule: self.all(k, !k.contains('\\'))
- message: tags values must be less than 256 characters
rule: self.all(k, size(self[k]) <= 256)
userData:
description: |-
userData is the base64-encoded custom data that will be passed to the VM at creation time.
The caller must pre-encode their cloud-init or bootstrap script to base64, as the Azure API
expects this field to contain a base64-encoded string.
The user is fully responsible for providing valid bootstrap/cloud-init data.
When this field is set, no Karpenter-managed bootstrapping is performed.
maxLength: 87380
type: string
vnetSubnetID:
description: |-
vnetSubnetID is the subnet used by nics provisioned with this nodeclass.
If not specified, we will use the default --vnet-subnet-id specified in karpenter's options config.
pattern: (?i)^\/subscriptions\/[^\/]+\/resourceGroups\/[a-zA-Z0-9_\-().]{0,89}[a-zA-Z0-9_\-()]\/providers\/Microsoft\.Network\/virtualNetworks\/[^\/]+\/subnets\/[^\/]+$
type: string
type: object
status:
description: status contains the resolved state of the AzureNodeClass.
properties:
conditions:
description: conditions contains signals for health and readiness
items:
description: Condition aliases the upstream type and adds additional
helper methods
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
type: object
type: object
served: true
storage: true
subresources:
status: {}
136 changes: 136 additions & 0 deletions designs/0007-azurevm-provision-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# AzureVM Provision Mode

**Author:** @comtalyst

**Last updated:** March 7, 2026

**Status:** Proposed

## Overview

AzureVM provision mode enables Karpenter to provision standalone Azure Virtual Machines that are **not** part of an AKS cluster. This opens Karpenter as a general-purpose VM autoscaler for any Kubernetes distribution running on Azure (e.g., self-managed k8s, Rancher, OpenShift, Talos).

In existing AKS modes (`AKSMachineAPI`, `AKSScriptless`, `BootstrappingClient`), Karpenter relies heavily on AKS-specific infrastructure: image family resolution via AKS VHD build system, node bootstrapping via AKS's cloud-init/CSE pipeline, AKS billing extensions, and AKS load balancer backend pool management. AzureVM mode bypasses all of these, giving the user full control over image selection and node bootstrapping.

### Goals

* Allow Karpenter to provision VMs outside of AKS clusters
* Support user-provided VM images (Compute Gallery, SIG, or any ARM image resource)
* Support user-provided bootstrap data (cloud-init / custom scripts via `userData`)
* Support per-NodeClass subscription, resource group, and location overrides for multi-subscription deployments
* Support per-NodeClass managed identity assignment
* Support optional data disk attachment
* Maintain backward compatibility — existing AKS modes are unaffected

### Non-Goals

* Windows VM support (Linux only for now)
* Automatic image updates or OS patching
* Karpenter-managed node bootstrapping (the user provides their own)
* AKS billing extension or AKS identifying extension in AzureVM mode
* Node auto-join to AKS clusters (use AKS modes for that)

## Architecture

### New CRD: AzureNodeClass (karpenter.azure.com/v1alpha1)

A new CRD `AzureNodeClass` is introduced alongside the existing `AKSNodeClass`. It contains fields relevant to generic Azure VM provisioning:

```yaml
apiVersion: karpenter.azure.com/v1alpha1
kind: AzureNodeClass
metadata:
name: my-nodeclass
spec:
imageID: "/subscriptions/.../Microsoft.Compute/galleries/.../versions/1.0.0"
userData: "#!/bin/bash\nkubeadm join ..."
vnetSubnetID: "/subscriptions/.../subnets/worker-subnet"
osDiskSizeGB: 128
dataDiskSizeGB: 256
subscriptionID: "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
resourceGroup: "my-custom-rg"
location: "westus2"
managedIdentities:
- "/subscriptions/.../userAssignedIdentities/my-identity"
tags:
environment: production
security:
encryptionAtHost: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The design doc YAML example includes fields that don't exist in this PR's CRD: subscriptionID, resourceGroup, location, and dataDiskSizeGB. These are added in PR #1497 later in the chain. This creates a confusing artifact where the design doc advertises an API surface that doesn't match the CRD in this commit. Consider either:

  1. Stripping these fields from the design doc and adding them in PR feat: add multi-subscription support and dataDiskSizeGB #1497, or
  2. Adding a note like "Fields marked with are added in a subsequent PR" to set expectations.

```

### Adapter Pattern

Internally, `AzureNodeClass` is converted to `AKSNodeClass` via an adapter function (`AKSNodeClassFromAzureNodeClass`). The VM provider always operates on `AKSNodeClass`, with AzureVM-specific fields carried via `json:"-"` fields that don't appear in the AKSNodeClass CRD schema:

```
AzureNodeClass → adapter → AKSNodeClass (with hidden fields) → VM Provider
```

This avoids duplicating the entire VM provider and keeps the code path unified.

### Provider Behavior by Mode

| Behavior | AKS Modes | AzureVM Mode |
|---|---|---|
| Image resolution | AKS VHD image families | User-provided `imageID` |
| Node bootstrap | AKS cloud-init / CSE | User-provided `userData` |
| LB backend pools | Configured from AKS LB | Skipped |
| NSG lookup | AKS-managed NSG | Skipped |
| Billing extension | Installed | Skipped |
| Identifying extension | Installed | Skipped |
| CSE extension | Installed (bootstrappingclient) | Skipped |
| K8s version validation | Required | Skipped |
| Data disks | Not supported | Optional via `dataDiskSizeGB` |
| Multi-subscription | Not supported | Optional via `subscriptionID` |

## Decisions

### Decision 1: Separate CRD vs. extending AKSNodeClass

#### Option A: Add all fields to AKSNodeClass
Pro: Single CRD. Con: Pollutes the AKSNodeClass with non-AKS fields; confusing UX for AKS users.

#### Option B: New AzureNodeClass CRD with adapter pattern
Pro: Clean separation of concerns; each CRD has only the fields relevant to its use case. Con: Slight code complexity from the adapter.

#### Conclusion: Option B
The adapter pattern keeps the AKSNodeClass API clean and focused on AKS, while AzureNodeClass serves the standalone VM use case. The adapter is a simple mapping function, not a complex abstraction layer.

### Decision 2: Multi-subscription client management

#### Option A: Create new SDK clients per-request
Pro: Simple. Con: Expensive — Azure SDK client creation involves HTTP transport setup.

#### Option B: Lazy, cached per-subscription client pool (AZClientManager)
Pro: Clients are created once per subscription and reused. Thread-safe via double-checked locking. Con: Slight memory overhead for cached clients.

#### Conclusion: Option B
`AZClientManager` provides `GetClients(subscriptionID)` which returns cached `SubscriptionClients` (containing VirtualMachinesClient, VirtualMachineExtensionsClient, NetworkInterfacesClient, SubnetsClient). Default subscription returns the existing AZClient's clients directly.

### Decision 3: Data disk configuration

Data disks are configured as Premium_LRS managed disks attached at LUN 0 with auto-delete on VM termination. This is a simple, opinionated default suitable for container runtime storage. Future iterations may support multiple disks, custom storage account types, or per-disk configuration.

## PR Chain

The feature is delivered as a chain of incremental PRs:

1. **PR 1487 — AzureNodeClass CRD** (`dd8cb731`): Defines the new CRD and adapter
2. **PR 1488 — AzureVM provision mode** (`ad6c5a2d`): Adds `--provision-mode=azurevm` flag with relaxed validation
3. **PR 1489 — Azure VM provider** (`3bfa8942`): Core VM provider changes for AzureVM mode
4. **PR 1497 — Multi-subscription + data disk** (`d0963558`): Per-NodeClass overrides, AZClientManager, data disk

## Testing

* Unit tests for all new helper functions (configureStorageProfile, configureOSProfile, buildVMIdentity, configureDataDisk, resolveEffectiveClients)
* Unit tests for AZClientManager (default subscription, lazy creation)
* Unit tests for AKSNodeClassFromAzureNodeClass adapter (all field mappings)
* Unit tests for GetManagedExtensionNames (AzureVM mode returns no extensions)
* E2E tests planned with self-managed k8s cluster using custom images

## Production Readiness

* **RBAC**: The controller's managed identity / service principal must have VM Contributor and Network Contributor roles in any target subscription
* **Quotas**: Standard Azure VM quotas apply per-subscription
* **Observability**: Existing Karpenter metrics (vm_create_start, vm_create_failure) apply. Error codes are extracted via `ErrorCodeForMetrics`
* **Upgrade path**: AzureNodeClass is v1alpha1; field changes before GA are expected
3 changes: 3 additions & 0 deletions pkg/apis/apis.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ var (
//CompatibilityGroup = "compatibility." + Group
//go:embed crds/karpenter.azure.com_aksnodeclasses.yaml
AKSNodeClassCRD []byte
//go:embed crds/karpenter.azure.com_azurenodeclasses.yaml
AzureNodeClassCRD []byte
//go:embed crds/karpenter.sh_nodepools.yaml
NodePoolCRD []byte
//go:embed crds/karpenter.sh_nodeclaims.yaml
Expand All @@ -38,6 +40,7 @@ var (
NodeOverlayCRD []byte
CRDs = []*apiextensionsv1.CustomResourceDefinition{
object.Unmarshal[apiextensionsv1.CustomResourceDefinition](AKSNodeClassCRD),
object.Unmarshal[apiextensionsv1.CustomResourceDefinition](AzureNodeClassCRD),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Adding AzureNodeClassCRD to the global CRDs slice means both AKSNodeClass and AzureNodeClass CRDs will be installed in every cluster, regardless of provision mode. Is this intentional? In AKS mode, the AzureNodeClass CRD is unused; in AzureVM mode, the AKSNodeClass CRD is unused. This is harmless (extra CRD definition on the API server) but could confuse users who see both CRDs via kubectl get crd. Later PR #1489 makes controller registration mode-aware — consider whether CRD installation should also be conditional.

object.Unmarshal[apiextensionsv1.CustomResourceDefinition](NodePoolCRD),
object.Unmarshal[apiextensionsv1.CustomResourceDefinition](NodeClaimCRD),
object.Unmarshal[apiextensionsv1.CustomResourceDefinition](NodeOverlayCRD),
Expand Down
Loading
Loading