infra: publish Aspire CLI native AOT symbols (Win + Linux + macOS) to MSDL#17567
Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17567Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17567" |
cef8102 to
f86182b
Compare
joperezr
left a comment
There was a problem hiding this comment.
Assuming we have done a dry run and this works.
6cc16ec to
b432f9c
Compare
b432f9c to
a9ceac7
Compare
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
4a45433 to
9917ea1
Compare
9626257 to
db9e79e
Compare
|
While you're here, we also want to upload symbols for aspire-managed.exe |
cb08c3d to
c2224b6
Compare
|
aspire-managed.exe already ships with embedded PDBs — the repo defaults to Verified against |
We should consider changing this for the bundle arguably, i.e. optimize for distribution/layout size rather than ease of debuggability. For NuGet packages I think embedded debug symbols is generally the right trade-off, but we don't expect end-users to be debugging the managed host or other parts of the bundle, e.g. dashboard, DCP, etc., and even if they do, they can download the symbols from the symbol store. |
3843c16 to
185322f
Compare
8563994 to
fe046a6
Compare
Stage the native AOT debug-info payload produced by each per-RID
clipack project for later upload to MSDL/SymWeb:
* Windows (.pdb) copied loose into native-symbols-staging/<rid>/
and picked up via FilesToPublishToSymbolServer
(eng/Publishing.props loose-PDB glob).
* Linux (.dbg) / packed into Aspire.Cli.<rid>.<version>.symbols.nupkg
macOS (.dwarf) by the new eng/clipack/Aspire.Cli.NativeSymbols.proj
helper (NuGet TfmSpecificDebugSymbolsFile +
AllowedOutputExtensionsInSymbolsPackageBuildOutputFolder
extension allowlist).
Common.projitems wires both routes via _PackNativeAotSymbolsWindows /
_PackNativeAotSymbolsUnix, each AfterTargets="PackDotnetTool" and keyed
on the RID family so neither carries inner conditions for the other.
The helper proj's _ValidatePackedSymbols target asserts post-Pack that
exactly one .symbols.nupkg landed and contains the expected
tools/<tfm>/<rid>/aspire.<dbg|dwarf> entry.
See docs/ci/cli-native-symbols.md for the operating doctrine.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pull the per-RID native_symbols_<rid> artifacts produced by the clipack
build, route them through arcade's publish step, and ship the contents
to MSDL/SymWeb:
* build_sign_native.yml publishes a native_symbols_<rid> artifact
per RID from native-symbols-staging/<rid>/ at the end of each
per-RID native build.
* download_native_symbols.yml (new) pulls those artifacts on the
publish agent, splits them: Windows loose aspire.pdb → kept under
artifacts/native-symbols/ for FilesToPublishToSymbolServer; Linux
/ macOS .symbols.nupkg → moved into packages/<config>/Shipping so
arcade's _ExistingSymbolPackage filter classifies it as a Symbols
asset and routes it to SymbolUploadHelper.
* azure-pipelines{,-unofficial}.yml invoke download_native_symbols
before the existing publish step.
See docs/ci/cli-native-symbols.md for the routing rationale.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extend _PublishBlobItems with per-RID symbol coverage checks alongside
the existing archive/tool-package/npm-package checks. Derives the
expected RID set from _ExpectedCliRids (same source of truth as the
other coverage checks) and asserts:
* one loose aspire.pdb under artifacts/native-symbols/<config>/
native_symbols_<rid>/ for every win-* RID
* one Aspire.Cli.<rid>.<version>.symbols.nupkg in Shipping/ for every
linux-*/osx-* RID
Plus a defensive Error if any per-item RID-extraction regex returns
an empty ExtractedRid (covers all five extraction sites — archive,
CLI tool pkg, CLI npm pkg, and the two new symbol sites). Without
this check, an unmatched filename still fails the build via
_MissingXRids but with a confusing "all RIDs missing" message
instead of the precise "we couldn't parse these filenames."
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A one-shot local-validation tool that reproduces the full symbol
round-trip without uploading anything to MSDL. Three checks per RID,
ordered loosest to strictest:
A. Identifier symmetry — binary intrinsic ID (PDB GUID+Age,
ELF BuildID, Mach-O LC_UUID) matches the symbol file's ID.
B. dotnet-symbol round-trip — a real dotnet-symbol invocation
against a local HTTP symstore (rooted at the SSQP-keyed
directory) downloads a byte-identical copy of the symbol file.
C. Resolver-readable content — platform symbolicator (atos /
addr2line / llvm-symbolizer) can actually resolve the binary's
entry-point VA using the file Check B downloaded.
Catches what pipeline success cannot: that the right symbol file was
paired with the right binary, that its bytes survived packaging
intact, and that those bytes can actually resolve a stack frame.
MSDL will happily accept mismatched, malformed, or unresolvable
bytes; the first symptom is the next crash-triage attempt months
later, with already-shipped builds unrecoverable.
Manual-only by design; not wired into CI. Run before any change to
this pipeline, or when an arcade SDK / .NET SDK / Xcode bump touches
symbol handling. See docs/ci/cli-native-symbols.md for the operating
doctrine and the relationship to the production pipeline.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Operating doctrine for the symbol-publish path: where ILC writes symbols per platform, why macOS ships the flat DWARF instead of the .dSYM bundle, the two arcade publishing routes (loose-PDB for Windows vs .symbols.nupkg for Linux/macOS), per-RID coverage gate, SSQP key forms, upstream contracts the code depends on (dotnet/runtime, dotnet/arcade, dotnet/symstore), and the local validation tool contract. Aimed at: someone touching any part of this pipeline after a dotnet/arcade SDK bump, a .NET SDK bump, an Xcode update on the macOS build agent, or a "dotnet-symbol returns nothing" report. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e8ed842 to
b0e6ea3
Compare
_PackNativeAotSymbolsWindows / _PackNativeAotSymbolsUnix were firing in flows that invoke PackDotnetTool without first running PublishToDisk — notably eng/AfterSigning.targets' _PackCliDotnetToolAfterPack, which runs on every build-packages.yml job (Linux 8-core) regardless of TargetRids: that target invokes Targets="PackDotnetTool" on Aspire.Cli.<rid>.csproj, which extracts the native binary from the already-packed archive but never re-runs ILC, so artifacts/bin/ Aspire.Cli/Release/<tfm>/<rid>/native/aspire.dbg doesn't exist on the runner. The symbol-pack AfterTargets then tripped the "Expected native AOT debug-info payload not found" Error. Add Exists($(_NativeOutputDir)aspire[.exe]) to each target's Condition so the symbol staging silently no-ops when the publish binary isn't present (publish wasn't run on this runner — there's nothing to stage). When the publish binary IS present, the Errors inside each target still catch the real "ILC ran but the symbol file isn't where we expect" failure mode. CI repro: https://github.com/microsoft/aspire/actions/runs/26997197364 Tests / Build packages / Build packages job: `error : Expected native AOT debug-info payload not found at '/home/runner/work/aspire/aspire/artifacts/bin/Aspire.Cli/Release/ net10.0/linux-x64/native/aspire.dbg' for linux-x64.` Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR wires Aspire CLI NativeAOT native debug symbols into the internal publishing pipeline so they get uploaded to MSDL/SymWeb and can be retrieved by dotnet-symbol for crash triage across Windows (PDB), Linux (DBG), and macOS (DWARF).
Changes:
- Adds NativeAOT symbol staging during per-RID CLI packing and publishes those artifacts from
build_sign_native. - Downloads/stages the symbol artifacts during the publish pipeline and adds publishing + per-RID coverage validation in
eng/Publishing.props. - Adds a local validation script (
validate-cli-symbols.ps1) and operating doctrine documentation (docs/ci/cli-native-symbols.md).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| eng/scripts/validate-cli-symbols.ps1 | New local end-to-end validator for symbol ID symmetry, dotnet-symbol round-trip, and basic symbol resolution. |
| eng/Publishing.props | Adds Windows loose-PDB symbol publishing and per-RID gating for expected symbol artifacts. |
| eng/pipelines/templates/download_native_symbols.yml | New template to download native symbol artifacts and stage symbol packages into Shipping for arcade publishing. |
| eng/pipelines/templates/build_sign_native.yml | Publishes per-RID native symbol artifacts (native_symbols_<rid>) from the build/sign jobs. |
| eng/pipelines/azure-pipelines.yml | Integrates symbol download/staging into the publish flow and excludes .symbols.nupkg from the “native-cli-packages” download set. |
| eng/pipelines/azure-pipelines-unofficial.yml | Same as official pipeline: integrates symbol download/staging and excludes .symbols.nupkg from “native-cli-packages”. |
| eng/clipack/Common.projitems | Adds post-pack symbol staging targets: loose .pdb (Windows) and .symbols.nupkg production (Linux/macOS). |
| eng/clipack/Aspire.Cli.NativeSymbols.proj | New helper project that packs .dbg/.dwarf into Aspire.Cli.<rid>.<ver>.symbols.nupkg via NuGet pack hooks and validates package contents. |
| docs/ci/cli-native-symbols.md | New documentation describing the symbol publishing architecture, SSQP keys, validation script, and maintenance guidance. |
The section-header comment said the round-trip went "via local file:// store", but the implementation has always used a loopback HttpListener because dotnet-symbol's server-path parser only accepts http(s). The on-screen header on the next line already says "via local symstore" and the description block at the top of the file already says "local HTTP server" — only this one comment was stale and contradicted the code below it. Update it to match, with a one-line note explaining why file:// isn't an option. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
adamint
left a comment
There was a problem hiding this comment.
Reviewed the build/signing/publishing changes with multiple passes and ran a practical validation pass over the pipeline/MSBuild wiring. I did not find any blocking issues.
…ilename globs match nested paths Three automated PRs in the last 14d fired more tests than they needed to: #17672, #17263 [Automated] Update ATS API Surface Area #17534 Move repository skills to .agents They each touched only api/*.txt or only README/skill markdown, but were firing trigger_all or selective:integrations — burning the full ~25min CI critical path. Audit-replay over 161 merged PRs found them, plus a latent C# glob bug where bare-filename patterns silently ignored nested matches. Root causes — two independent bugs: 1. The category-trigger rescue logic in RescueCategoryTriggerFiles built a synthetic union of all category triggerPaths without consulting per-category excludePaths. An ignored file that textually matched some category's glob but was excluded from that category got rescued back to active and then hit fallback_unmatched (worse than staying ignored). For src/Aspire.Hosting.Foundry/api/*.txt this meant ATS-only PRs fired integrations even after `**/api/*.txt` was added to ignorePaths. 2. The four glob analyzers (CriticalFileDetector, IgnorePathFilter, CategoryMapper.CompiledCategory, ProjectMappingResolver.CompiledMapping) handed user-facing patterns directly to FileSystemGlobbing.Matcher. The Matcher anchors bare-filename patterns at the repo root, so `Directory.Build.props` matched only the root file, not `src/Directory.Build.props` or `tests/Directory.Build.props`. The Python audit-replay evaluator (eval_rules.py) already documented and applied a "prepend **/ to bare-filename patterns" rule; the C# analyzers did not, so the two evaluators silently disagreed on ~5 patterns across ignorePaths, triggerAllPaths, and sourceToTestMappings. The fix: - Rescue now passes config.Categories directly to CategoryMapper so CompiledCategory.Matches honors per-category excludes. A file is rescued only when at least one category would actually fire on it. - New PatternNormalization.NormalizeGlob prepends `**/` to any pattern without a path separator. Every glob entry point applies it: the four analyzers above plus ProjectMappingResolver's regex compiler. - Rules: integrations.excludePaths gains `tests/Aspire.Acquisition.Tests/**`, `tests/Infrastructure.Tests/**`, `**/*.md`, `**/api/*.txt`. The same `**/api/*.txt` exclude is added to every category so an ignored ATS file can't be rescued back by any category. ignorePaths gains `**/api/*.txt`. Acquisition mapping's source list gains the missing self-mapping `tests/Aspire.Acquisition.Tests/**` (Templates and Infrastructure mappings already had this; Acquisition was an oversight exposed only after the new exclude was added). Verification: - Audit replay over 161 merged PRs: 4 outcomes change (#17263, #17534, #17549, #17672 all move to `skip`); zero regressions; zero fallback_unmatched. - New AuditFixtureTests xUnit [Theory] replays 28 hand-validated PRs against the live audit rules. Each row is a separate test, so any future rule edit that changes a row's outcome shows up as a visible CI failure. Coverage includes templates (#16447), CLI native build (#17567), extension multi-category (#17881/17698/17772), Hosting-core trigger_all (#17879), polyglot (#17948), and the regression canaries for previous fallback_unmatched cases. - Per-component regression tests pin both bugs: two new tests in EndToEndEvaluationTests for rescue+excludes; two more for bare-filename matching at nested paths. - Three pre-existing analyzer tests had asserted the buggy bare-filename behavior as expected (e.g. `*.md` not matching `docs/guide.md`). Updated with comments explaining the user-intent rule. - Full TestSelector namespace: 290 tests, all pass. No collateral damage on the wider Infrastructure.Tests suite (5 pre-existing baseline failures unchanged). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Aspire CLI ships as a NativeAOT binary but its native debug symbols never reached MSDL/SymWeb.
dotnet symbolagainst a shippedaspirereturned nothing on any of Windows / Linux / macOS, so customer crash reports and our own triage couldn't symbolicate CLI stack traces.Root cause
ILC emits a per-platform symbol artifact next to the binary on every NativeAOT build (
aspire.pdb/aspire.dbg/aspire.dSYM), but nothing in our pipeline routed those into arcade's publish step. The infrastructure was wired up; it just had no file to upload.The fix
Two arcade publishing routes, one per platform family:
.pdb— loose-file path viaFilesToPublishToSymbolServer(arcade's loose-PDB path is.pdb/.dll-only)..dbg/ macOS.dwarf— packed intoAspire.Cli.<rid>.<version>.symbols.nupkgvia NuGet'sTfmSpecificDebugSymbolsFilehook, routed through arcade's_ExistingSymbolPackage→SymbolUploadHelper.Per-RID coverage gate in
eng/Publishing.propsasserts one symbol artifact per expected RID at publish time.eng/scripts/validate-cli-symbols.ps1reproduces the full round-trip locally (identifier symmetry →dotnet-symboldownload → resolver-readable content) — manual, not in CI.See
docs/ci/cli-native-symbols.mdfor the operating doctrine — ILC output paths, the macOS flat-DWARF vs.dSYMtradeoff, SSQP key forms per platform, and the upstream contracts (dotnet/runtime, dotnet/arcade, dotnet/symstore) this depends on.Call-outs
azure-pipelines-public.ymldoes not runbuild_sign_native. Verification: internal AzDO build 2992798, which exercises the fullbuild_sign_nativematrix across all 7 RIDs (Win x64/arm64, Linux x64/arm64/musl-x64, macOS x64/arm64) producingnative_symbols_<rid>artifacts for arcade's Publish Assets stage..dSYMbundle. Server-mediateddotnet-symbolsymbolication (the primary CLI crash-triage flow) only needs the flat form; Apple-native automatic symbolication via Spotlight is the open work tracked by dotnet/runtime#88286.AutoGenerateSymbolPackagesstaysfalse. That property is about managed-PDB →.symbols.nupkgwrapping for NuGet packages, independent of the native symbol publishing here.