Skip to content

feat: add AzureNodeClass CRD at karpenter.azure.com/v1alpha1#1487

Open
comtalyst wants to merge 3 commits intocomtalyst/vm-path-refactor-3-helpersfrom
comtalyst/azure-nodeclass-crd
Open

feat: add AzureNodeClass CRD at karpenter.azure.com/v1alpha1#1487
comtalyst wants to merge 3 commits intocomtalyst/vm-path-refactor-3-helpersfrom
comtalyst/azure-nodeclass-crd

Conversation

@comtalyst
Copy link
Collaborator

@comtalyst comtalyst commented Mar 5, 2026

Fixes #

Description

Add AzureNodeClass CRD in karpenter.azure.com/v1alpha1 for non-AKS Azure VM provisioning. Defines a simpler, AKS-independent node class with imageID, userData, managedIdentities, vnetSubnetID, osDiskSizeGB, tags, and security fields.

Includes status controller interfaces (StatusConditions, GetConditions, SetConditions), scheme registration, and deep copy generation. Adds CRD YAML to charts and api packages.

How was this change tested?

  • go build ./... passes
  • make verify passes (lint, codegen, validation)
  • Existing tests unaffected — no behavior changes to AKSNodeClass path
  • CRD can be applied to cluster: kubectl apply -f charts/karpenter-crd/templates/karpenter.azure.com_azurenodeclasses.yaml

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note


@comtalyst comtalyst force-pushed the comtalyst/azure-nodeclass-crd branch from 3cb9739 to 86ae031 Compare March 5, 2026 21:08
@comtalyst comtalyst changed the title GHOST-HAND-03052026: PR 1 feat: add AzureNodeClass CRD at karpenter.azure.com/v1alpha1 Mar 6, 2026
@comtalyst comtalyst force-pushed the comtalyst/test-reunification branch 2 times, most recently from ab524b1 to 30f0bee Compare March 6, 2026 05:19
@comtalyst comtalyst force-pushed the comtalyst/azure-nodeclass-crd branch from 86ae031 to dd8cb73 Compare March 7, 2026 10:04
@comtalyst comtalyst force-pushed the comtalyst/test-reunification branch from 3c495a6 to f26017e Compare March 7, 2026 11:28
@comtalyst comtalyst force-pushed the comtalyst/azure-nodeclass-crd branch from dd8cb73 to 65dda61 Compare March 7, 2026 11:30
@comtalyst comtalyst changed the base branch from comtalyst/test-reunification to comtalyst/vm-path-refactor-3-helpers March 7, 2026 11:31
@comtalyst comtalyst force-pushed the comtalyst/vm-path-refactor-3-helpers branch from 1b09f65 to f574895 Compare March 7, 2026 12:28
@comtalyst comtalyst force-pushed the comtalyst/azure-nodeclass-crd branch from 65dda61 to 7ba3f53 Compare March 7, 2026 12:28
comtalyst and others added 2 commits March 8, 2026 01:32
Introduce a new AzureNodeClass CRD that provides a generic Azure VM
node class, independent of AKS-specific concerns. This CRD supports
custom images (imageID), verbatim user bootstrap data (userData),
per-NodeClass managed identities, and standard Azure VM configuration.

AzureNodeClass is intended for non-AKS control plane scenarios where
the user manages their own bootstrap process, while AKSNodeClass
continues to serve AKS-managed clusters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add design/0007-azurevm-provision-mode.md covering the architectural
decisions for the new AzureVM provision mode, including CRD design,
validation relaxation, provider modularization, and the PR chain
dependency graph.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@comtalyst comtalyst force-pushed the comtalyst/vm-path-refactor-3-helpers branch from f574895 to 84835be Compare March 8, 2026 09:32
@comtalyst comtalyst force-pushed the comtalyst/azure-nodeclass-crd branch from ed6b34d to c38d143 Compare March 8, 2026 09:32
- Fix imageID regex to accept Community Gallery images (/CommunityGalleries/...)
  in addition to standard ARM resource IDs
- Remove suspicious Microsoft.AzureHybridBenefit from imageID pattern
- Add maxLength=1024 on imageID
- Add maxLength=87380 on userData (Azure CustomData limit)
- Fix userData comment: Azure SDK does NOT auto-base64-encode, caller must
  provide pre-encoded data
- Add comprehensive CEL validation test suite (33 tests covering imageID,
  vnetSubnetID, osDiskSizeGB, tags, managedIdentities, security, userData)
- Regenerate CRD YAML and chart template

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
// expects this field to contain a base64-encoded string.
// The user is fully responsible for providing valid bootstrap/cloud-init data.
// When this field is set, no Karpenter-managed bootstrapping is performed.
// +kubebuilder:validation:MaxLength=87380
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The imageID regex uses Microsoft\.Compute\/.* which is overly permissive — it accepts any resource under Microsoft.Compute, including non-image resources like /Microsoft.Compute/disks/someDisk or even just /Microsoft.Compute/ (trailing slash, empty path). Consider tightening to .+ instead of .* to at least require something after the trailing slash, or better yet, enumerate the supported resource types:

(?i)^(\/subscriptions\/[^\/]+\/resourceGroups\/[^\/]+\/providers\/Microsoft\.Compute\/(galleries|images)\/.+|\/CommunityGalleries\/[^\/]+\/images\/[^\/]+\/versions\/[^\/]+)$

At minimum, change .* to .+ to prevent accepting a bare Microsoft.Compute/ path.

// +kubebuilder:validation:XValidation:message="tags keys must not contain '\\'",rule="self.all(k, !k.contains('\\\\'))"
// +kubebuilder:validation:XValidation:message="tags values must be less than 256 characters",rule="self.all(k, size(self[k]) <= 256)"
// +optional
Tags map[string]string `json:"tags,omitempty" hash:"ignore"`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: managedIdentities has a MaxItems=10 constraint but no per-item validation (e.g., pattern or format). This means users can pass arbitrary strings like "not-a-resource-id" and get a cryptic Azure API error at VM creation time instead of a CRD validation error. Consider adding a +kubebuilder:validation:items:Pattern or +kubebuilder:validation:items:MinLength=1 to catch obviously invalid entries at admission time.

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

var _ = AfterEach(func() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: There are two AfterEach blocks registered — one in crd_validation_cel_test.go (manual deletion loop) and one in suite_test.go (ExpectCleanedUp). Both clean up resources after each test, which is redundant. The ExpectCleanedUp in suite_test.go should be sufficient — remove the manual AfterEach in this file to avoid double-cleanup confusion.


import (
"github.com/Azure/karpenter-provider-azure/pkg/apis"
corev1 "k8s.io/apimachinery/pkg/apis/meta/v1"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The alias corev1 for k8s.io/apimachinery/pkg/apis/meta/v1 is misleading — corev1 conventionally refers to k8s.io/api/core/v1. This is consistent with the existing v1beta1/doc.go so not a blocker, but worth noting for a future cleanup. The standard alias for k8s.io/apimachinery/pkg/apis/meta/v1 is metav1.

tags:
environment: production
security:
encryptionAtHost: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The design doc YAML example includes fields that don't exist in this PR's CRD: subscriptionID, resourceGroup, location, and dataDiskSizeGB. These are added in PR #1497 later in the chain. This creates a confusing artifact where the design doc advertises an API surface that doesn't match the CRD in this commit. Consider either:

  1. Stripping these fields from the design doc and adding them in PR feat: add multi-subscription support and dataDiskSizeGB #1497, or
  2. Adding a note like "Fields marked with are added in a subsequent PR" to set expectations.

var ctx context.Context
var env *coretest.Environment
var azureEnv *test.Environment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: azureEnv is initialized in BeforeSuite but never referenced in any test. This is dead code — the CEL validation tests only use env.Client (the core test environment). Either remove the azureEnv declaration and initialization, or add a comment explaining it's needed for side-effects (if test.NewEnvironment registers something necessary).

"k8s.io/apimachinery/pkg/runtime/schema"
)

const Group = "karpenter.azure.com"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: const Group = "karpenter.azure.com" is defined in both v1alpha1/register.go (this PR) and v1beta1/register.go (existing). This creates two independent constants with the same value. If the group name ever changes, one could be updated while the other is missed. Consider using the existing apis.Group constant (already imported in doc.go) instead of redeclaring it here. Note: this is consistent with the existing v1beta1 pattern, so not a blocker if you want to keep parity.

NodeOverlayCRD []byte
CRDs = []*apiextensionsv1.CustomResourceDefinition{
object.Unmarshal[apiextensionsv1.CustomResourceDefinition](AKSNodeClassCRD),
object.Unmarshal[apiextensionsv1.CustomResourceDefinition](AzureNodeClassCRD),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Adding AzureNodeClassCRD to the global CRDs slice means both AKSNodeClass and AzureNodeClass CRDs will be installed in every cluster, regardless of provision mode. Is this intentional? In AKS mode, the AzureNodeClass CRD is unused; in AzureVM mode, the AKSNodeClass CRD is unused. This is harmless (extra CRD definition on the API server) but could confuse users who see both CRDs via kubectl get crd. Later PR #1489 makes controller registration mode-aware — consider whether CRD installation should also be conditional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant