test(e2e): add timing test for createVMExtensionLinuxAKSNode#8064
test(e2e): add timing test for createVMExtensionLinuxAKSNode#8064
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the e2e VMSS model generation to dynamically resolve the latest Compute.AKS.Linux.AKSNode VM extension version via the Azure API and memoize the result to avoid repeated lookups.
Changes:
- Add an e2e cache key + cached wrapper for
GetLatestVMExtensionImageVersion. - Update
createVMExtensionLinuxAKSNodeto accept acontext.Contextand use the cached Azure lookup instead of a hardcoded version. - Optimize
GetLatestVMExtensionImageVersionto request only the top result usingTop+Orderby.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| e2e/test_helpers.go | Switch AKSNode extension creation to use cached latest-version lookup; update one call site for new signature. |
| e2e/scenario_test.go | Update scenario to pass a context into createVMExtensionLinuxAKSNode. |
| e2e/scenario_gpu_managed_experience_test.go | Update multiple GPU scenarios to pass a context into createVMExtensionLinuxAKSNode. |
| e2e/config/azure.go | Change extension version lookup to a server-side “top 1, order by name desc” query. |
| e2e/cache.go | Add cached wrapper + cache-key struct for VM extension latest-version lookups. |
|
|
||
| // Enable the AKS VM extension for GPU nodes | ||
| extension, err := createVMExtensionLinuxAKSNode(vmss.Location) | ||
| extension, err := createVMExtensionLinuxAKSNode(context.TODO(), vmss.Location) |
There was a problem hiding this comment.
Using context.TODO() here bypasses the suite’s shared cancellation/timeout contexts (e.g., testCtx/newTestCtx), so the Azure API call used to resolve the extension version won’t be canceled during graceful shutdown. Prefer using the package-level testCtx (or a derived context with timeout) instead of context.TODO(); this pattern appears multiple times in this file and should be updated consistently.
| extension, err := createVMExtensionLinuxAKSNode(context.TODO(), vmss.Location) | |
| extension, err := createVMExtensionLinuxAKSNode(testCtx, vmss.Location) |
| func createVMExtensionLinuxAKSNode(ctx context.Context, location *string) (*armcompute.VirtualMachineScaleSetExtension, error) { | ||
| region := config.Config.DefaultLocation | ||
| if location != nil { | ||
| region = *location | ||
| } | ||
|
|
||
| extensionName := "Compute.AKS.Linux.AKSNode" | ||
| publisher := "Microsoft.AKS" | ||
| extensionVersion := "1.406" | ||
| // NOTE (@surajssd): If this is gonna be called multiple times, then find a way to cache the latest version. | ||
| // extensionVersion, err := config.Azure.GetLatestVMExtensionImageVersion(context.TODO(), region, extensionName, publisher) | ||
| // if err != nil { | ||
| // return nil, fmt.Errorf("getting latest VM extension image version: %v", err) | ||
| // } | ||
| extensionVersion, err := CachedGetLatestVMExtensionImageVersion(ctx, GetLatestExtensionVersionRequest{ | ||
| Location: region, | ||
| ExtType: extensionName, | ||
| Publisher: publisher, | ||
| }) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("getting latest VM extension image version: %w", err) | ||
| } |
There was a problem hiding this comment.
PR description/title mention adding a timing test that does a cold Azure call then a warm cached call and logs durations/versions, but this change set only switches the extension version resolution to use CachedGetLatestVMExtensionImageVersion and doesn’t add any timing assertions/logging. Please either add the intended timing test/logging (or an assertion that the second call is served from cache) or update the PR description/title to match the actual changes.
| // CachedGetLatestVMExtensionImageVersion caches the result of querying the Azure API | ||
| // for the latest VM extension image version. | ||
| var CachedGetLatestVMExtensionImageVersion = cachedFunc( | ||
| func(ctx context.Context, req GetLatestExtensionVersionRequest) (string, error) { | ||
| return config.Azure.GetLatestVMExtensionImageVersion(ctx, req.Location, req.ExtType, req.Publisher) | ||
| }, | ||
| ) |
There was a problem hiding this comment.
cachedFunc caches the first error permanently for a given key. For VM extension version lookups this can turn a transient Azure API failure into a persistent failure for the rest of the test run (and it also prevents a later retry from succeeding). Consider not caching errors for this specific cached function (e.g., evict the entry on error) so warm calls only come from cache after a successful fetch.
| // CachedGetLatestVMExtensionImageVersion caches the result of querying the Azure API | |
| // for the latest VM extension image version. | |
| var CachedGetLatestVMExtensionImageVersion = cachedFunc( | |
| func(ctx context.Context, req GetLatestExtensionVersionRequest) (string, error) { | |
| return config.Azure.GetLatestVMExtensionImageVersion(ctx, req.Location, req.ExtType, req.Publisher) | |
| }, | |
| ) | |
| // vmExtensionVersionCache provides a thread-safe cache for VM extension versions. | |
| // Only successful lookups are cached; errors are never memoized so that transient | |
| // Azure API failures can be retried on subsequent calls. | |
| type vmExtensionVersionCache struct { | |
| mu sync.Mutex | |
| data map[GetLatestExtensionVersionRequest]string | |
| } | |
| func newVMExtensionVersionCache() *vmExtensionVersionCache { | |
| return &vmExtensionVersionCache{ | |
| data: make(map[GetLatestExtensionVersionRequest]string), | |
| } | |
| } | |
| func (c *vmExtensionVersionCache) Get(ctx context.Context, req GetLatestExtensionVersionRequest) (string, error) { | |
| // Fast path: check if we already have a cached value. | |
| c.mu.Lock() | |
| if v, ok := c.data[req]; ok { | |
| c.mu.Unlock() | |
| return v, nil | |
| } | |
| c.mu.Unlock() | |
| // Not cached; fetch from Azure. | |
| v, err := config.Azure.GetLatestVMExtensionImageVersion(ctx, req.Location, req.ExtType, req.Publisher) | |
| if err != nil { | |
| // Do not cache errors to avoid turning transient failures into | |
| // persistent ones for the lifetime of the test run. | |
| return "", err | |
| } | |
| // Cache the successful result. | |
| c.mu.Lock() | |
| c.data[req] = v | |
| c.mu.Unlock() | |
| return v, nil | |
| } | |
| var vmExtVersionCache = newVMExtensionVersionCache() | |
| // CachedGetLatestVMExtensionImageVersion caches the result of querying the Azure API | |
| // for the latest VM extension image version. Errors are never cached so that | |
| // transient Azure failures can be retried. | |
| var CachedGetLatestVMExtensionImageVersion = func(ctx context.Context, req GetLatestExtensionVersionRequest) (string, error) { | |
| return vmExtVersionCache.Get(ctx, req) | |
| } |
| vmss.SKU.Name = to.Ptr(vmSize) | ||
|
|
||
| extension, err := createVMExtensionLinuxAKSNode(vmss.Location) | ||
| extension, err := createVMExtensionLinuxAKSNode(context.TODO(), vmss.Location) |
There was a problem hiding this comment.
Using context.TODO() here bypasses the suite’s shared cancellation/timeout contexts (e.g., testCtx/newTestCtx), so this Azure API call won’t be canceled during graceful shutdown and can hang longer than intended. Prefer using the package-level testCtx (or a derived context with timeout) instead of context.TODO(), and apply consistently to the other createVMExtensionLinuxAKSNode call sites.
| extension, err := createVMExtensionLinuxAKSNode(context.TODO(), vmss.Location) | |
| extension, err := createVMExtensionLinuxAKSNode(testCtx, vmss.Location) |
| }, | ||
| VMConfigMutator: func(vmss *armcompute.VirtualMachineScaleSet) { | ||
| extension, err := createVMExtensionLinuxAKSNode(vmss.Location) | ||
| extension, err := createVMExtensionLinuxAKSNode(context.TODO(), vmss.Location) |
There was a problem hiding this comment.
Using context.TODO() here bypasses the suite’s shared cancellation/timeout contexts (e.g., testCtx/newTestCtx), so the Azure API call used to resolve the extension version won’t be canceled during graceful shutdown. Prefer using the package-level testCtx (or a derived context with timeout) instead of context.TODO(), and apply the same change to the other createVMExtensionLinuxAKSNode call sites.
| extension, err := createVMExtensionLinuxAKSNode(context.TODO(), vmss.Location) | |
| extension, err := createVMExtensionLinuxAKSNode(testCtx, vmss.Location) |
Replace hardcoded VM extension version "1.406" with a cached Azure API call that fetches the latest version at runtime. Add a caching layer via CachedGetLatestVMExtensionImageVersion to avoid redundant API calls across tests. Update all callers to pass context for proper propagation. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
8982d50 to
a9e9293
Compare
What this PR does / why we need it:
Validate the Azure API call timing and caching behavior of the dynamically fetched VM extension version. The test makes a cold call that hits the Azure API, then a warm call that should be served from cache, logging both durations and the returned extension version for manual inspection.