feat: add AzureNodeClass CRD at karpenter.azure.com/v1alpha1#1487
feat: add AzureNodeClass CRD at karpenter.azure.com/v1alpha1#1487comtalyst wants to merge 3 commits intocomtalyst/vm-path-refactor-3-helpersfrom
Conversation
3cb9739 to
86ae031
Compare
ab524b1 to
30f0bee
Compare
86ae031 to
dd8cb73
Compare
3c495a6 to
f26017e
Compare
dd8cb73 to
65dda61
Compare
1b09f65 to
f574895
Compare
65dda61 to
7ba3f53
Compare
Introduce a new AzureNodeClass CRD that provides a generic Azure VM node class, independent of AKS-specific concerns. This CRD supports custom images (imageID), verbatim user bootstrap data (userData), per-NodeClass managed identities, and standard Azure VM configuration. AzureNodeClass is intended for non-AKS control plane scenarios where the user manages their own bootstrap process, while AKSNodeClass continues to serve AKS-managed clusters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add design/0007-azurevm-provision-mode.md covering the architectural decisions for the new AzureVM provision mode, including CRD design, validation relaxation, provider modularization, and the PR chain dependency graph. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f574895 to
84835be
Compare
ed6b34d to
c38d143
Compare
- Fix imageID regex to accept Community Gallery images (/CommunityGalleries/...) in addition to standard ARM resource IDs - Remove suspicious Microsoft.AzureHybridBenefit from imageID pattern - Add maxLength=1024 on imageID - Add maxLength=87380 on userData (Azure CustomData limit) - Fix userData comment: Azure SDK does NOT auto-base64-encode, caller must provide pre-encoded data - Add comprehensive CEL validation test suite (33 tests covering imageID, vnetSubnetID, osDiskSizeGB, tags, managedIdentities, security, userData) - Regenerate CRD YAML and chart template Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| // expects this field to contain a base64-encoded string. | ||
| // The user is fully responsible for providing valid bootstrap/cloud-init data. | ||
| // When this field is set, no Karpenter-managed bootstrapping is performed. | ||
| // +kubebuilder:validation:MaxLength=87380 |
There was a problem hiding this comment.
Suggestion: The imageID regex uses Microsoft\.Compute\/.* which is overly permissive — it accepts any resource under Microsoft.Compute, including non-image resources like /Microsoft.Compute/disks/someDisk or even just /Microsoft.Compute/ (trailing slash, empty path). Consider tightening to .+ instead of .* to at least require something after the trailing slash, or better yet, enumerate the supported resource types:
(?i)^(\/subscriptions\/[^\/]+\/resourceGroups\/[^\/]+\/providers\/Microsoft\.Compute\/(galleries|images)\/.+|\/CommunityGalleries\/[^\/]+\/images\/[^\/]+\/versions\/[^\/]+)$
At minimum, change .* to .+ to prevent accepting a bare Microsoft.Compute/ path.
| // +kubebuilder:validation:XValidation:message="tags keys must not contain '\\'",rule="self.all(k, !k.contains('\\\\'))" | ||
| // +kubebuilder:validation:XValidation:message="tags values must be less than 256 characters",rule="self.all(k, size(self[k]) <= 256)" | ||
| // +optional | ||
| Tags map[string]string `json:"tags,omitempty" hash:"ignore"` |
There was a problem hiding this comment.
Suggestion: managedIdentities has a MaxItems=10 constraint but no per-item validation (e.g., pattern or format). This means users can pass arbitrary strings like "not-a-resource-id" and get a cryptic Azure API error at VM creation time instead of a CRD validation error. Consider adding a +kubebuilder:validation:items:Pattern or +kubebuilder:validation:items:MinLength=1 to catch obviously invalid entries at admission time.
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| ) | ||
|
|
||
| var _ = AfterEach(func() { |
There was a problem hiding this comment.
Issue: There are two AfterEach blocks registered — one in crd_validation_cel_test.go (manual deletion loop) and one in suite_test.go (ExpectCleanedUp). Both clean up resources after each test, which is redundant. The ExpectCleanedUp in suite_test.go should be sufficient — remove the manual AfterEach in this file to avoid double-cleanup confusion.
|
|
||
| import ( | ||
| "github.com/Azure/karpenter-provider-azure/pkg/apis" | ||
| corev1 "k8s.io/apimachinery/pkg/apis/meta/v1" |
There was a problem hiding this comment.
Nit: The alias corev1 for k8s.io/apimachinery/pkg/apis/meta/v1 is misleading — corev1 conventionally refers to k8s.io/api/core/v1. This is consistent with the existing v1beta1/doc.go so not a blocker, but worth noting for a future cleanup. The standard alias for k8s.io/apimachinery/pkg/apis/meta/v1 is metav1.
| tags: | ||
| environment: production | ||
| security: | ||
| encryptionAtHost: true |
There was a problem hiding this comment.
Issue: The design doc YAML example includes fields that don't exist in this PR's CRD: subscriptionID, resourceGroup, location, and dataDiskSizeGB. These are added in PR #1497 later in the chain. This creates a confusing artifact where the design doc advertises an API surface that doesn't match the CRD in this commit. Consider either:
- Stripping these fields from the design doc and adding them in PR feat: add multi-subscription support and dataDiskSizeGB #1497, or
- Adding a note like "Fields marked with
†are added in a subsequent PR" to set expectations.
| var ctx context.Context | ||
| var env *coretest.Environment | ||
| var azureEnv *test.Environment | ||
|
|
There was a problem hiding this comment.
Issue: azureEnv is initialized in BeforeSuite but never referenced in any test. This is dead code — the CEL validation tests only use env.Client (the core test environment). Either remove the azureEnv declaration and initialization, or add a comment explaining it's needed for side-effects (if test.NewEnvironment registers something necessary).
| "k8s.io/apimachinery/pkg/runtime/schema" | ||
| ) | ||
|
|
||
| const Group = "karpenter.azure.com" |
There was a problem hiding this comment.
Issue: const Group = "karpenter.azure.com" is defined in both v1alpha1/register.go (this PR) and v1beta1/register.go (existing). This creates two independent constants with the same value. If the group name ever changes, one could be updated while the other is missed. Consider using the existing apis.Group constant (already imported in doc.go) instead of redeclaring it here. Note: this is consistent with the existing v1beta1 pattern, so not a blocker if you want to keep parity.
| NodeOverlayCRD []byte | ||
| CRDs = []*apiextensionsv1.CustomResourceDefinition{ | ||
| object.Unmarshal[apiextensionsv1.CustomResourceDefinition](AKSNodeClassCRD), | ||
| object.Unmarshal[apiextensionsv1.CustomResourceDefinition](AzureNodeClassCRD), |
There was a problem hiding this comment.
Question: Adding AzureNodeClassCRD to the global CRDs slice means both AKSNodeClass and AzureNodeClass CRDs will be installed in every cluster, regardless of provision mode. Is this intentional? In AKS mode, the AzureNodeClass CRD is unused; in AzureVM mode, the AKSNodeClass CRD is unused. This is harmless (extra CRD definition on the API server) but could confuse users who see both CRDs via kubectl get crd. Later PR #1489 makes controller registration mode-aware — consider whether CRD installation should also be conditional.
Fixes #
Description
Add
AzureNodeClassCRD inkarpenter.azure.com/v1alpha1for non-AKS Azure VM provisioning. Defines a simpler, AKS-independent node class withimageID,userData,managedIdentities,vnetSubnetID,osDiskSizeGB,tags, andsecurityfields.Includes status controller interfaces (
StatusConditions,GetConditions,SetConditions), scheme registration, and deep copy generation. Adds CRD YAML to charts and api packages.How was this change tested?
go build ./...passesmake verifypasses (lint, codegen, validation)kubectl apply -f charts/karpenter-crd/templates/karpenter.azure.com_azurenodeclasses.yamlDoes this change impact docs?
Release Note