feat: skip cloud-init ready report and add standalone report_ready script#8056
feat: skip cloud-init ready report and add standalone report_ready script#8056awesomenix wants to merge 2 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a VHD-baked mechanism to suppress cloud-init’s built-in “ready” report and replaces it with an explicit, CSE-invoked readiness/failure report to Azure wireserver.
Changes:
- Add a standalone
report_ready.pyscript that writes Hyper-V KVP provisioning status and POSTs health to wireserver. - Bake
report_ready.pyinto multiple VHD build variants and copy it to/opt/azure/containers/. - Add
skipCloudInitReadyReportto write cloud-init config, and invoke the reporting script fromcse_start.sh.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| vhdbuilder/packer/vhd-image-builder-mariner.json | Adds report_ready.py to files uploaded into the build VM. |
| vhdbuilder/packer/vhd-image-builder-mariner-cvm.json | Same: stage report_ready.py into build VM. |
| vhdbuilder/packer/vhd-image-builder-mariner-arm64.json | Same: stage report_ready.py into build VM. |
| vhdbuilder/packer/vhd-image-builder-flatcar.json | Same: stage report_ready.py into build VM. |
| vhdbuilder/packer/vhd-image-builder-flatcar-arm64.json | Same: stage report_ready.py into build VM. |
| vhdbuilder/packer/vhd-image-builder-cvm.json | Same: stage report_ready.py into build VM. |
| vhdbuilder/packer/vhd-image-builder-base.json | Same: stage report_ready.py into build VM. |
| vhdbuilder/packer/vhd-image-builder-arm64-gen2.json | Same: stage report_ready.py into build VM. |
| vhdbuilder/packer/vhd-image-builder-acl.json | Same: stage report_ready.py into build VM. |
| vhdbuilder/packer/packer_source.sh | Copies report_ready.py into /opt/azure/containers/ with execute perms. |
| vhdbuilder/packer/install-dependencies.sh | Invokes skipCloudInitReadyReport during VHD build. |
| vhdbuilder/packer/imagecustomizer/azlosguard/azlosguard.yml | Ensures OSGuard imagecustomizer also places report_ready.py into /opt/azure/containers/. |
| parts/linux/cloud-init/artifacts/report_ready.py | New standalone readiness/failure reporting implementation. |
| parts/linux/cloud-init/artifacts/cse_start.sh | Calls report_ready.py on provisioning success/failure if present on the VHD. |
| parts/linux/cloud-init/artifacts/cse_install.sh | Adds skipCloudInitReadyReport() helper that writes cloud-init config to skip built-in ready report. |
| addMarinerNvidiaRepo | ||
| updateDnfWithNvidiaPkg | ||
| overrideNetworkConfig || exit 1 | ||
| skipCloudInitReadyReport || exit 1 |
There was a problem hiding this comment.
skipCloudInitReadyReport is already invoked unconditionally earlier in this script, so calling it again in the Mariner/AzureLinux block is redundant and makes the flow harder to reason about. Consider removing this second call to keep the configuration applied in a single place.
| skipCloudInitReadyReport || exit 1 |
| if [ -x /opt/azure/containers/report_ready.py ]; then | ||
| if [ "$EXIT_CODE" -eq 0 ]; then | ||
| python3 /opt/azure/containers/report_ready.py -v || echo "WARNING: Failed to report ready to Azure fabric" | ||
| else | ||
| python3 /opt/azure/containers/report_ready.py -v --failure --description "ExitCode: ${EXIT_CODE}, ${message_string}" || echo "WARNING: Failed to report failure to Azure fabric" | ||
| fi |
There was a problem hiding this comment.
This report_ready.py invocation runs synchronously before log upload/exit, and can block provisioning for up to ~100s on wireserver timeouts/retries (GET/POST timeouts are 30s with multiple retries). If this is intended to be best-effort (as suggested by || echo "WARNING"), consider running it in the background on success and/or moving it after upload_logs (especially on failure) or passing tighter retry/timeout settings to avoid delaying provisioning and log upload.
| if [ -x /opt/azure/containers/report_ready.py ]; then | ||
| if [ "$EXIT_CODE" -eq 0 ]; then | ||
| python3 /opt/azure/containers/report_ready.py -v || echo "WARNING: Failed to report ready to Azure fabric" | ||
| else | ||
| python3 /opt/azure/containers/report_ready.py -v --failure --description "ExitCode: ${EXIT_CODE}, ${message_string}" || echo "WARNING: Failed to report failure to Azure fabric" | ||
| fi |
There was a problem hiding this comment.
This change updates cloud-init/CSE scripts under parts/, which are covered by snapshot-style golden tests in pkg/agent/testdata/* (e.g., baker_test.go reads ./testdata/<folder>/CustomData). Please run make generate (or regenerate the testdata via the repo’s standard workflow) and include the updated golden files in this PR; otherwise CI is likely to fail due to mismatched expected CustomData/CSE outputs.
…ript Bake experimental_skip_ready_report into VHD via cloud.cfg.d to skip cloud-init's built-in health ready report to Azure fabric. Add a standalone Python script (report_ready.py) that can be invoked from CSE to report ready at the appropriate time during node provisioning. Depends on canonical/cloud-init#6771. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
89a690e to
ed3d302
Compare
Bake experimental_skip_ready_report into VHD via cloud.cfg.d to skip
cloud-init's built-in health ready report to Azure fabric. Add a
standalone Python script (report_ready.py) that can be invoked from
CSE to report ready at the appropriate time during node provisioning.
Depends on canonical/cloud-init#6771.
I tried importing the cloud init library directly and copilot suggested this
The problems:
more. It works on the VM (cloud-init's deps are installed), but it's a large dependency surface.
construct one or monkey-patch it.
instantiated_handler_registry.registered_items["telemetry"], which is only populated during cloud-init's normal boot. Outside of cloud-init's lifecycle, this
returns None and silently skips KVP reporting.
cloud-init versions.
A middle ground: you could import just the KVP handler class directly and the low-level wireserver pieces, bypassing the high-level functions:
from cloudinit.reporting.handlers import HyperVKvpReportingHandler
handler = HyperVKvpReportingHandler()
handler.write_key("PROVISIONING_REPORT", report_string)
But the wireserver reporting (GoalState + health POST) has deep entanglement with url_helper, distro objects, and telemetry decorators that make it hard to
call standalone.