admin api: Serial Console Retrieval by rajdeepc2792 · Pull Request #4257 · Azure/ARO-HCP

rajdeepc2792 · 2026-02-27T20:35:24Z

https://issues.redhat.com/browse/ARO-17482

What

Introduces Admin-API Serial Console endpoint that provides SREs with access to VM serial console logs for debugging boot issues, kernel panics, and other low-level problems in HCP cluster's Nodepool VMs.

GET /admin/v1/hcp{resourceId}/serialconsole?vmName={vmName}

Why

Debugging boot failures in worker nodes
Investigating kernel panics
Diagnosing networking issues preventing node readiness
Troubleshooting systemd service failures during startup
Post-mortem analysis of node crashes

Special notes for your reviewer

Testing

Unit Tests added
Tested the Geneva Action Code(https://msazure.visualstudio.com/AzureRedHatOpenShift/_git/Compute-ARO-GenevaActions/pullrequest/14929552) locally to trigger the implementation successfully
E2E tests added and tested successfully locally

test/util/framework/admin_api_helpers.go

admin/server/handlers/hcp/serialconsole.go

Copilot

Pull request overview

Adds an Admin API endpoint to retrieve Azure VM serial console logs for HCP clusters, enabling SRE debugging via boot diagnostics/serial console output.

Changes:

Introduces GET /admin/v1/hcp{resourceId}/serialconsole?vmName=... handler that calls Azure Compute boot diagnostics APIs and streams the serial console log blob back to the caller.
Adds vmName validation helper + unit tests, and refactors shared Cluster Service error handling for correct 404 behavior.
Adds E2E coverage and supporting test framework helpers for discovering a VM and toggling boot diagnostics.

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`admin/server/server/admin.go`	Registers the new `/serialconsole` HCP admin route.
`admin/server/handlers/hcp/serialconsole.go`	Implements the serial console retrieval handler (boot diagnostics lookup + blob download/stream).
`admin/server/handlers/hcp/serialconsole_test.go`	Adds handler-level unit tests for validation and dependency error paths.
`admin/server/handlers/hcp/helpers.go`	Adds reusable `ClusterServiceError` helper to correctly translate CS 404s.
`admin/server/handlers/hcp/helpers_test.go`	Unit tests for `ClusterServiceError`.
`admin/server/handlers/hcp/breakglass/create.go`	Switches to shared `ClusterServiceError` helper (removes local duplicate).
`internal/validation/validators.go`	Adds Azure VM name regex + exported validation helper.
`internal/validation/validate_vmname.go`	Adds tests for VM name validation (currently added as a non-`*_test.go` file).
`test/util/framework/admin_api_helpers.go`	Adds test helpers for calling the new endpoint and manipulating/listing VMs via ARM compute.
`test/e2e/admin_api.go`	Adds E2E tests for serial console happy path and “boot diagnostics disabled” conflict case.
`test/testdata/zz_fixture_TestMainListSuitesForEachSuite_rp_api_compat_all_parallel_01rp_api_compat_all_parallel_development.txt`	Updates suite listing fixture to include the new SRE serial console test.
`test/testdata/zz_fixture_TestMainListSuitesForEachSuite_dev_cd_check_paralleldev_cd_check_parallel.txt`	Updates suite listing fixture to include the new SRE serial console test.
`admin/server/go.mod`	Adds Azure compute SDK dependency (and adjusts mock dependency classification).
`admin/server/go.sum`	Updates sums for new/updated dependencies.
`test-integration/go.mod`	Adds `armcompute/v5` as an indirect dependency (appears unused within `test-integration/`).
`test-integration/go.sum`	Updates sums for new/updated dependencies.
`admin/README.md`	Documents the new admin endpoint in the API table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

admin/server/handlers/hcp/serialconsole.go

test/util/framework/admin_api_helpers.go

test-integration/go.mod

test/e2e/admin_api.go

internal/validation/validate_vmname_test.go

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

admin/README.md

admin/server/handlers/hcp/serialconsole.go

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/util/framework/admin_api_helpers.go

admin/server/handlers/hcp/serialconsole.go

admin/server/handlers/hcp/serialconsole_test.go

admin/server/handlers/hcp/helpers_test.go

rajdeepc2792 · 2026-03-17T14:57:28Z

/test e2e-parallel

admin/server/handlers/hcp/serialconsole.go

bennerv · 2026-03-17T23:06:51Z

admin/server/handlers/hcp/serialconsole.go

+
+	// download blob content with timeout to avoid stuck handlers on slow blob endpoints
+	httpClient := &http.Client{
+		Timeout: 30 * time.Second,


Why use a client-side timeout instead of plumbing contexts?

Removed the client-side timeout

@bennerv should we still do a context.WithTimeout to be explicit about how long we want to wait here? otherwise we just get we the request context defines.

Haven't added the context.WithTimeout yet, can wait for the decision. Or, handle it in the middleware at some point if we want to have a default request timeout?

My vote:

We 100% should add middleware with some kind of timeout (5 minutes?) as a stop-gap

As needed for optimization, individual handlers can/should restrict that timeout even further to exit earlier so SREs aren't kept waiting on reasonably-known failure states for a particular task.

admin/server/handlers/hcp/helpers_test.go

admin/server/handlers/hcp/serialconsole.go

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

admin/server/handlers/hcp/serialconsole.go

admin/server/handlers/hcp/serialconsole_test.go

test/util/framework/admin_api_helpers.go

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

internal/validation/validators.go

admin/server/handlers/hcp/serialconsole.go

admin/server/handlers/hcp/serialconsole_test.go

tony-schndr

looks good overall

admin/server/handlers/hcp/serialconsole.go

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

admin/server/handlers/hcp/serialconsole.go

admin/server/handlers/hcp/serialconsole_test.go

rajdeepc2792 · 2026-03-25T12:25:00Z

admin/server/handlers/hcp/helpers.go

+// specific CloudError. This prevents ReportError from misinterpreting it as
+// "HCP resource not found" (the HCP was already found in the database).
+// Non-OCM errors are wrapped for ReportError to handle as internal errors.
+func ClusterServiceError(err error, what string) error {


Initially in the PR this was being used by the serialconsolelogs implementation, but since it no longer uses it, this can be moved back to breakglass. But decided to keep it here for any similar future use.
Please comment if anyone feel against moving the function here in this PR.

SudoBrendan

I think that overall, this /lgtm - if you want to address any of these (or do a follow-up) that's just as well. Thanks Raj!

admin/server/handlers/hcp/serialconsole.go

SudoBrendan · 2026-03-25T19:25:02Z

admin/server/handlers/hcp/serialconsole.go

+					http.StatusConflict,
+					arm.CloudErrorCodeConflict,
+					"",
+					"Diagnostics might be disabled for VM %s", vmName,


nit: as an SRE working a 2am incident - if I saw this error message raw I might be confused and need to review the code to understand the error. Adding a bit of clarity to what this error actually means (and how to resolve it?) for on-call could be helpful if that's even possible. I think this is especially true considering we never expect diagnostics to be off - the oddest cases likely deserve the clearest messaging.

SudoBrendan · 2026-03-25T19:30:56Z

admin/server/handlers/hcp/serialconsole.go

+	limitedReader := io.LimitReader(blobResp.Body, maxBytes)
+	_, err = io.Copy(writer, limitedReader)


nit: should we write a warning or something if data is truncated at 100MB? Note - this truncates to the beginning rather than the end of the content. Should we instead return the end of the log?

Good suggestion, I will address in the next PR where I can just handle the streaming of the logs.

SudoBrendan · 2026-03-25T19:33:57Z

admin/server/handlers/hcp/serialconsole.go

+	writer.Header().Set("Cache-Control", "no-cache, no-store, must-revalidate")
+	writer.Header().Set("Pragma", "no-cache")
+	writer.Header().Set("Expires", "0")
+	writer.WriteHeader(http.StatusOK)


bug: once you write a header - it can NEVER be overwritten. Your errors at L183 are silently ignored and go to logs - but SREs are getting 200/OK (and potentially getting partial data) by putting this here.

I think the best way for us to handle this is to buffer the full content and only return once we have the data in-hand. if we're capping at 100MB I think this is fine?

Yes, that's true and that's the idea when streaming the logs.

I guess I meant - streaming can fail, and there's not a great way to handle those failures and notify on-call of the failures. Considering we're only dealing with 100MB, it might be best to eliminate streaming all-together, and only return once we've got the full data to return safely with 200. That has implications on memory for admin-api for sure - but it seems reasonable - and I don't think we expect to call this endpoint often (e.g. I don't see us invoking this on every ICM; it's just a critical gap we need to fill in rare situations).

ultimately it's not a huge deal either way - the scenarios are slim - but considering that (silent) failures here could lead to incorrect assumptions in triage of incidents, it seems important to be more correct than resource-aware?

SudoBrendan · 2026-03-25T19:41:44Z

test/e2e/admin_api.go

+			Expect(err.Error()).To(ContainSubstring("400"))
+		})
+
+	It("should return 409 when boot diagnostics is disabled on a VM",


could we save a cluster create and NP create cycle by adding the "fetch VM -> disable -> 409" to the previous suite? I guess that conflates positive/negative ... but this feels heavy for a single HTTP test. Could it be an integration test?

I believe I thought of doing it in a single suite, but separated it as the disabling one requires an external action on the VM to disable the boot diagnostics which might not work in the envs where there's deny assignment in place. But the previous suite would work in any environment.

openshift-ci · 2026-03-25T19:44:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rajdeepc2792, SudoBrendan
Once this PR has been reviewed and has the lgtm label, please assign janboll for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from geoberle and miquelsi February 27, 2026 20:35

geoberle reviewed Mar 4, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 6, 2026 13:08

Copilot started reviewing on behalf of rajdeepc2792 March 6, 2026 13:09 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

rajdeepc2792 force-pushed the rajdeepc2792/ARO-17482 branch 2 times, most recently from a0834b8 to bbe2bd9 Compare March 6, 2026 13:52

Copilot AI review requested due to automatic review settings March 6, 2026 13:52

Copilot started reviewing on behalf of rajdeepc2792 March 6, 2026 13:53 View session

rajdeepc2792 force-pushed the rajdeepc2792/ARO-17482 branch from bbe2bd9 to 64f0ced Compare March 6, 2026 13:59

Copilot AI reviewed Mar 6, 2026

View reviewed changes

admin/README.md Outdated Show resolved Hide resolved

admin/server/handlers/hcp/serialconsole.go Outdated Show resolved Hide resolved

admin/server/handlers/hcp/serialconsole.go Outdated Show resolved Hide resolved

rajdeepc2792 force-pushed the rajdeepc2792/ARO-17482 branch from 64f0ced to 207848e Compare March 6, 2026 14:07

Copilot AI review requested due to automatic review settings March 6, 2026 14:25

Copilot started reviewing on behalf of rajdeepc2792 March 6, 2026 14:26 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

bennerv reviewed Mar 17, 2026

View reviewed changes

admin/server/handlers/hcp/serialconsole.go Outdated Show resolved Hide resolved

bennerv reviewed Mar 17, 2026

View reviewed changes

admin/server/handlers/hcp/helpers_test.go Outdated Show resolved Hide resolved

bennerv reviewed Mar 17, 2026

View reviewed changes

admin/server/handlers/hcp/serialconsole.go Outdated Show resolved Hide resolved

bennerv reviewed Mar 17, 2026

View reviewed changes

admin/server/handlers/hcp/serialconsole.go Outdated Show resolved Hide resolved

rajdeepc2792 force-pushed the rajdeepc2792/ARO-17482 branch from 77f6ef2 to c06f4c5 Compare March 18, 2026 13:58

Copilot AI review requested due to automatic review settings March 18, 2026 14:05

rajdeepc2792 force-pushed the rajdeepc2792/ARO-17482 branch from c06f4c5 to 2302601 Compare March 18, 2026 14:05

Copilot AI reviewed Mar 18, 2026

View reviewed changes

rajdeepc2792 force-pushed the rajdeepc2792/ARO-17482 branch from 2302601 to 47d7e02 Compare March 18, 2026 14:38

Copilot started reviewing on behalf of rajdeepc2792 March 18, 2026 14:43 View session

Copilot AI review requested due to automatic review settings March 20, 2026 15:56

rajdeepc2792 force-pushed the rajdeepc2792/ARO-17482 branch from 47d7e02 to 87f6c6a Compare March 20, 2026 15:56

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 24, 2026 15:36

rajdeepc2792 force-pushed the rajdeepc2792/ARO-17482 branch from ccadc3f to 7d8cf3a Compare March 24, 2026 15:36

Copilot started reviewing on behalf of rajdeepc2792 March 24, 2026 15:39 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

internal/validation/validators.go Show resolved Hide resolved

admin/server/handlers/hcp/serialconsole.go Outdated Show resolved Hide resolved

admin/server/handlers/hcp/serialconsole.go Outdated Show resolved Hide resolved

admin/server/handlers/hcp/serialconsole_test.go Show resolved Hide resolved

rajdeepc2792 force-pushed the rajdeepc2792/ARO-17482 branch from 7d8cf3a to ff7df37 Compare March 24, 2026 16:09

tony-schndr reviewed Mar 24, 2026

View reviewed changes

admin/server/handlers/hcp/serialconsole.go Outdated Show resolved Hide resolved

rajdeepc2792 added 13 commits March 25, 2026 08:16

ARO-17482 admin-api nodepool VM serial console logs

6982436

ARO-17482 admin-api serial console unit-test

303e699

ARO-17482 admin-api serial console GA e2e

91a50e6

ARO-17482 admin-api serial console README update

f890a44

ARO-17482 lint fix

3021a07

ARO-17482 resolve PR comments

fb55ec2

ARO-17482 resolve copilot comments

a642b3e

ARO-17482 update e2e fixtures

cb2e1a7

ARO-17482 limit blob reader and other fix

007c1bf

ARO-17482 get tenant id from subscription doc

5491e73

ARO-17482 copilot fixes

0471720

ARO-17482 resolve mockDBClient conflicts

f3f21e4

ARO-17482 remove the obvious comments

c715df0

Copilot AI review requested due to automatic review settings March 25, 2026 12:17

rajdeepc2792 force-pushed the rajdeepc2792/ARO-17482 branch from ff7df37 to c715df0 Compare March 25, 2026 12:17

Copilot started reviewing on behalf of rajdeepc2792 March 25, 2026 12:17 View session

rajdeepc2792 requested review from bennerv, geoberle and tony-schndr March 25, 2026 12:21

Copilot AI reviewed Mar 25, 2026

View reviewed changes

admin/server/handlers/hcp/serialconsole.go Show resolved Hide resolved

admin/server/handlers/hcp/serialconsole.go Show resolved Hide resolved

admin/server/handlers/hcp/serialconsole_test.go Show resolved Hide resolved

rajdeepc2792 commented Mar 25, 2026

View reviewed changes

SudoBrendan approved these changes Mar 25, 2026

View reviewed changes

		limitedReader := io.LimitReader(blobResp.Body, maxBytes)
		_, err = io.Copy(writer, limitedReader)

Conversation

rajdeepc2792 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Special notes for your reviewer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rajdeepc2792 commented Mar 17, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SudoBrendan Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tony-schndr left a comment

rajdeepc2792 commented Feb 27, 2026 •

edited

Loading

SudoBrendan Mar 25, 2026 •

edited

Loading

SudoBrendan Mar 25, 2026 •

edited

Loading

SudoBrendan Mar 25, 2026 •

edited

Loading

SudoBrendan Mar 25, 2026 •

edited

Loading