Skip to content

infra: publish Aspire CLI native AOT symbols (Win + Linux + macOS) to MSDL#17567

Merged
adamint merged 7 commits into
microsoft:mainfrom
radical:radical/cli-aot-pdb-publish
Jun 5, 2026
Merged

infra: publish Aspire CLI native AOT symbols (Win + Linux + macOS) to MSDL#17567
adamint merged 7 commits into
microsoft:mainfrom
radical:radical/cli-aot-pdb-publish

Conversation

@radical

@radical radical commented May 28, 2026

Copy link
Copy Markdown
Member

Aspire CLI ships as a NativeAOT binary but its native debug symbols never reached MSDL/SymWeb. dotnet symbol against a shipped aspire returned nothing on any of Windows / Linux / macOS, so customer crash reports and our own triage couldn't symbolicate CLI stack traces.

$ dotnet symbol --symbols aspire.exe -o ./syms
Downloading from https://msdl.microsoft.com/download/symbols/
ERROR: Not Found

Root cause

ILC emits a per-platform symbol artifact next to the binary on every NativeAOT build (aspire.pdb / aspire.dbg / aspire.dSYM), but nothing in our pipeline routed those into arcade's publish step. The infrastructure was wired up; it just had no file to upload.

The fix

Two arcade publishing routes, one per platform family:

  • Windows .pdb — loose-file path via FilesToPublishToSymbolServer (arcade's loose-PDB path is .pdb/.dll-only).
  • Linux .dbg / macOS .dwarf — packed into Aspire.Cli.<rid>.<version>.symbols.nupkg via NuGet's TfmSpecificDebugSymbolsFile hook, routed through arcade's _ExistingSymbolPackageSymbolUploadHelper.

Per-RID coverage gate in eng/Publishing.props asserts one symbol artifact per expected RID at publish time. eng/scripts/validate-cli-symbols.ps1 reproduces the full round-trip locally (identifier symmetry → dotnet-symbol download → resolver-readable content) — manual, not in CI.

See docs/ci/cli-native-symbols.md for the operating doctrine — ILC output paths, the macOS flat-DWARF vs .dSYM tradeoff, SSQP key forms per platform, and the upstream contracts (dotnet/runtime, dotnet/arcade, dotnet/symstore) this depends on.

Call-outs

  • Cannot be PR-validated through GitHub Actions. azure-pipelines-public.yml does not run build_sign_native. Verification: internal AzDO build 2992798, which exercises the full build_sign_native matrix across all 7 RIDs (Win x64/arm64, Linux x64/arm64/musl-x64, macOS x64/arm64) producing native_symbols_<rid> artifacts for arcade's Publish Assets stage.
  • macOS ships the flat DWARF, not the .dSYM bundle. Server-mediated dotnet-symbol symbolication (the primary CLI crash-triage flow) only needs the flat form; Apple-native automatic symbolication via Spotlight is the open work tracked by dotnet/runtime#88286.
  • AutoGenerateSymbolPackages stays false. That property is about managed-PDB → .symbols.nupkg wrapping for NuGet packages, independent of the native symbol publishing here.

@github-actions

github-actions Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17567

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17567"

@joperezr joperezr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we have done a dry run and this works.

@radical radical changed the title infra: publish Aspire CLI native AOT pdbs (win-x64, win-arm64) to MSDL infra: publish Aspire CLI native AOT symbols (Windows + Linux) to MSDL May 28, 2026
@radical radical force-pushed the radical/cli-aot-pdb-publish branch 2 times, most recently from 6cc16ec to b432f9c Compare May 28, 2026 20:16
@radical radical changed the title infra: publish Aspire CLI native AOT symbols (Windows + Linux) to MSDL infra: publish Aspire CLI native AOT symbols (Win + Linux + macOS) to MSDL May 28, 2026
@radical radical force-pushed the radical/cli-aot-pdb-publish branch from b432f9c to a9ceac7 Compare May 28, 2026 21:43
@github-actions

Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

@github-actions

Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

@radical radical force-pushed the radical/cli-aot-pdb-publish branch from 4a45433 to 9917ea1 Compare May 29, 2026 01:31
@davidfowl davidfowl added this to the 13.4 milestone May 29, 2026
@radical radical force-pushed the radical/cli-aot-pdb-publish branch 6 times, most recently from 9626257 to db9e79e Compare May 31, 2026 18:25
@davidfowl

Copy link
Copy Markdown
Contributor

While you're here, we also want to upload symbols for aspire-managed.exe

@radical radical force-pushed the radical/cli-aot-pdb-publish branch from cb08c3d to c2224b6 Compare June 1, 2026 00:49
@radical

radical commented Jun 1, 2026

Copy link
Copy Markdown
Member Author

aspire-managed.exe already ships with embedded PDBs — the repo defaults to DebugType=embedded (Directory.Build.props:23), so each .dll carries its portable PDB inside the PE debug directory (entry type 17). That's the alternative to MSDL upload, not a precursor; debuggers find the symbols directly in the binary without a server round-trip.

Verified against ~/.aspire/versions/13.4.0_9c260c29a6.../: 4 Aspire-owned assemblies have both an embedded PDB (debug-dir type 17) and a CodeView record (type 2) — aspire-managed.dll, Aspire.Dashboard.dll, Aspire.Hosting.RemoteHost.dll, Aspire.TypeSystem.dll. Runtime BCL .dlls use DebugType=portable + MSDL (CodeView-only), which is the route this PR sets up for the NativeAOT CLI binary (no embedded option exists for native AOT).

@radical radical modified the milestones: 13.4, 13.4.x Jun 1, 2026
@DamianEdwards

DamianEdwards commented Jun 1, 2026

Copy link
Copy Markdown
Member

aspire-managed.exe already ships with embedded PDBs — the repo defaults to DebugType=embedded (Directory.Build.props:23), so each .dll carries its portable PDB inside the PE debug directory (entry type 17). That's the alternative to MSDL upload, not a precursor; debuggers find the symbols directly in the binary without a server round-trip.

We should consider changing this for the bundle arguably, i.e. optimize for distribution/layout size rather than ease of debuggability. For NuGet packages I think embedded debug symbols is generally the right trade-off, but we don't expect end-users to be debugging the managed host or other parts of the bundle, e.g. dashboard, DCP, etc., and even if they do, they can download the symbols from the symbol store.

@radical radical force-pushed the radical/cli-aot-pdb-publish branch 2 times, most recently from 3843c16 to 185322f Compare June 5, 2026 00:07
@radical radical changed the base branch from release/13.4 to main June 5, 2026 00:13
@radical radical removed this from the 13.4.x milestone Jun 5, 2026
@radical radical force-pushed the radical/cli-aot-pdb-publish branch from 8563994 to fe046a6 Compare June 5, 2026 03:52
radical and others added 5 commits June 5, 2026 01:24
Stage the native AOT debug-info payload produced by each per-RID
clipack project for later upload to MSDL/SymWeb:

  * Windows (.pdb)        copied loose into native-symbols-staging/<rid>/
                          and picked up via FilesToPublishToSymbolServer
                          (eng/Publishing.props loose-PDB glob).

  * Linux (.dbg) /        packed into Aspire.Cli.<rid>.<version>.symbols.nupkg
    macOS (.dwarf)        by the new eng/clipack/Aspire.Cli.NativeSymbols.proj
                          helper (NuGet TfmSpecificDebugSymbolsFile +
                          AllowedOutputExtensionsInSymbolsPackageBuildOutputFolder
                          extension allowlist).

Common.projitems wires both routes via _PackNativeAotSymbolsWindows /
_PackNativeAotSymbolsUnix, each AfterTargets="PackDotnetTool" and keyed
on the RID family so neither carries inner conditions for the other.
The helper proj's _ValidatePackedSymbols target asserts post-Pack that
exactly one .symbols.nupkg landed and contains the expected
tools/<tfm>/<rid>/aspire.<dbg|dwarf> entry.

See docs/ci/cli-native-symbols.md for the operating doctrine.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pull the per-RID native_symbols_<rid> artifacts produced by the clipack
build, route them through arcade's publish step, and ship the contents
to MSDL/SymWeb:

  * build_sign_native.yml publishes a native_symbols_<rid> artifact
    per RID from native-symbols-staging/<rid>/ at the end of each
    per-RID native build.

  * download_native_symbols.yml (new) pulls those artifacts on the
    publish agent, splits them: Windows loose aspire.pdb → kept under
    artifacts/native-symbols/ for FilesToPublishToSymbolServer; Linux
    / macOS .symbols.nupkg → moved into packages/<config>/Shipping so
    arcade's _ExistingSymbolPackage filter classifies it as a Symbols
    asset and routes it to SymbolUploadHelper.

  * azure-pipelines{,-unofficial}.yml invoke download_native_symbols
    before the existing publish step.

See docs/ci/cli-native-symbols.md for the routing rationale.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extend _PublishBlobItems with per-RID symbol coverage checks alongside
the existing archive/tool-package/npm-package checks. Derives the
expected RID set from _ExpectedCliRids (same source of truth as the
other coverage checks) and asserts:

  * one loose aspire.pdb under artifacts/native-symbols/<config>/
    native_symbols_<rid>/ for every win-* RID
  * one Aspire.Cli.<rid>.<version>.symbols.nupkg in Shipping/ for every
    linux-*/osx-* RID

Plus a defensive Error if any per-item RID-extraction regex returns
an empty ExtractedRid (covers all five extraction sites — archive,
CLI tool pkg, CLI npm pkg, and the two new symbol sites). Without
this check, an unmatched filename still fails the build via
_MissingXRids but with a confusing "all RIDs missing" message
instead of the precise "we couldn't parse these filenames."

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A one-shot local-validation tool that reproduces the full symbol
round-trip without uploading anything to MSDL. Three checks per RID,
ordered loosest to strictest:

  A. Identifier symmetry — binary intrinsic ID (PDB GUID+Age,
     ELF BuildID, Mach-O LC_UUID) matches the symbol file's ID.
  B. dotnet-symbol round-trip — a real dotnet-symbol invocation
     against a local HTTP symstore (rooted at the SSQP-keyed
     directory) downloads a byte-identical copy of the symbol file.
  C. Resolver-readable content — platform symbolicator (atos /
     addr2line / llvm-symbolizer) can actually resolve the binary's
     entry-point VA using the file Check B downloaded.

Catches what pipeline success cannot: that the right symbol file was
paired with the right binary, that its bytes survived packaging
intact, and that those bytes can actually resolve a stack frame.
MSDL will happily accept mismatched, malformed, or unresolvable
bytes; the first symptom is the next crash-triage attempt months
later, with already-shipped builds unrecoverable.

Manual-only by design; not wired into CI. Run before any change to
this pipeline, or when an arcade SDK / .NET SDK / Xcode bump touches
symbol handling. See docs/ci/cli-native-symbols.md for the operating
doctrine and the relationship to the production pipeline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Operating doctrine for the symbol-publish path: where ILC writes
symbols per platform, why macOS ships the flat DWARF instead of the
.dSYM bundle, the two arcade publishing routes (loose-PDB for Windows
vs .symbols.nupkg for Linux/macOS), per-RID coverage gate, SSQP key
forms, upstream contracts the code depends on (dotnet/runtime,
dotnet/arcade, dotnet/symstore), and the local validation tool
contract.

Aimed at: someone touching any part of this pipeline after a
dotnet/arcade SDK bump, a .NET SDK bump, an Xcode update on the
macOS build agent, or a "dotnet-symbol returns nothing" report.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radical radical force-pushed the radical/cli-aot-pdb-publish branch from e8ed842 to b0e6ea3 Compare June 5, 2026 05:25
_PackNativeAotSymbolsWindows / _PackNativeAotSymbolsUnix were firing in
flows that invoke PackDotnetTool without first running PublishToDisk —
notably eng/AfterSigning.targets' _PackCliDotnetToolAfterPack, which
runs on every build-packages.yml job (Linux 8-core) regardless of
TargetRids: that target invokes Targets="PackDotnetTool" on
Aspire.Cli.<rid>.csproj, which extracts the native binary from the
already-packed archive but never re-runs ILC, so artifacts/bin/
Aspire.Cli/Release/<tfm>/<rid>/native/aspire.dbg doesn't exist on the
runner. The symbol-pack AfterTargets then tripped the "Expected native
AOT debug-info payload not found" Error.

Add Exists($(_NativeOutputDir)aspire[.exe]) to each target's Condition
so the symbol staging silently no-ops when the publish binary isn't
present (publish wasn't run on this runner — there's nothing to stage).
When the publish binary IS present, the Errors inside each target
still catch the real "ILC ran but the symbol file isn't where we
expect" failure mode.

CI repro:
https://github.com/microsoft/aspire/actions/runs/26997197364 Tests /
Build packages / Build packages job:
`error : Expected native AOT debug-info payload not found at
'/home/runner/work/aspire/aspire/artifacts/bin/Aspire.Cli/Release/
net10.0/linux-x64/native/aspire.dbg' for linux-x64.`

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radical radical marked this pull request as ready for review June 5, 2026 06:38
Copilot AI review requested due to automatic review settings June 5, 2026 06:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires Aspire CLI NativeAOT native debug symbols into the internal publishing pipeline so they get uploaded to MSDL/SymWeb and can be retrieved by dotnet-symbol for crash triage across Windows (PDB), Linux (DBG), and macOS (DWARF).

Changes:

  • Adds NativeAOT symbol staging during per-RID CLI packing and publishes those artifacts from build_sign_native.
  • Downloads/stages the symbol artifacts during the publish pipeline and adds publishing + per-RID coverage validation in eng/Publishing.props.
  • Adds a local validation script (validate-cli-symbols.ps1) and operating doctrine documentation (docs/ci/cli-native-symbols.md).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
eng/scripts/validate-cli-symbols.ps1 New local end-to-end validator for symbol ID symmetry, dotnet-symbol round-trip, and basic symbol resolution.
eng/Publishing.props Adds Windows loose-PDB symbol publishing and per-RID gating for expected symbol artifacts.
eng/pipelines/templates/download_native_symbols.yml New template to download native symbol artifacts and stage symbol packages into Shipping for arcade publishing.
eng/pipelines/templates/build_sign_native.yml Publishes per-RID native symbol artifacts (native_symbols_<rid>) from the build/sign jobs.
eng/pipelines/azure-pipelines.yml Integrates symbol download/staging into the publish flow and excludes .symbols.nupkg from the “native-cli-packages” download set.
eng/pipelines/azure-pipelines-unofficial.yml Same as official pipeline: integrates symbol download/staging and excludes .symbols.nupkg from “native-cli-packages”.
eng/clipack/Common.projitems Adds post-pack symbol staging targets: loose .pdb (Windows) and .symbols.nupkg production (Linux/macOS).
eng/clipack/Aspire.Cli.NativeSymbols.proj New helper project that packs .dbg/.dwarf into Aspire.Cli.<rid>.<ver>.symbols.nupkg via NuGet pack hooks and validates package contents.
docs/ci/cli-native-symbols.md New documentation describing the symbol publishing architecture, SSQP keys, validation script, and maintenance guidance.

Comment thread eng/scripts/validate-cli-symbols.ps1 Outdated
The section-header comment said the round-trip went "via local file://
store", but the implementation has always used a loopback HttpListener
because dotnet-symbol's server-path parser only accepts http(s). The
on-screen header on the next line already says "via local symstore"
and the description block at the top of the file already says
"local HTTP server" — only this one comment was stale and contradicted
the code below it. Update it to match, with a one-line note explaining
why file:// isn't an option.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

@adamint adamint left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the build/signing/publishing changes with multiple passes and ran a practical validation pass over the pipeline/MSBuild wiring. I did not find any blocking issues.

@adamint adamint merged commit 72aff17 into microsoft:main Jun 5, 2026
622 of 627 checks passed
@microsoft-github-policy-service microsoft-github-policy-service Bot added this to the 13.5 milestone Jun 5, 2026
@radical radical deleted the radical/cli-aot-pdb-publish branch June 5, 2026 19:32
radical added a commit that referenced this pull request Jun 9, 2026
…ilename globs match nested paths

Three automated PRs in the last 14d fired more tests than they needed to:

  #17672, #17263  [Automated] Update ATS API Surface Area
  #17534          Move repository skills to .agents

They each touched only api/*.txt or only README/skill markdown, but were
firing trigger_all or selective:integrations — burning the full ~25min
CI critical path. Audit-replay over 161 merged PRs found them, plus a
latent C# glob bug where bare-filename patterns silently ignored
nested matches.

Root causes — two independent bugs:

1. The category-trigger rescue logic in RescueCategoryTriggerFiles built
   a synthetic union of all category triggerPaths without consulting
   per-category excludePaths. An ignored file that textually matched some
   category's glob but was excluded from that category got rescued back
   to active and then hit fallback_unmatched (worse than staying
   ignored). For src/Aspire.Hosting.Foundry/api/*.txt this meant ATS-only
   PRs fired integrations even after `**/api/*.txt` was added to
   ignorePaths.

2. The four glob analyzers (CriticalFileDetector, IgnorePathFilter,
   CategoryMapper.CompiledCategory, ProjectMappingResolver.CompiledMapping)
   handed user-facing patterns directly to FileSystemGlobbing.Matcher.
   The Matcher anchors bare-filename patterns at the repo root, so
   `Directory.Build.props` matched only the root file, not
   `src/Directory.Build.props` or `tests/Directory.Build.props`. The
   Python audit-replay evaluator (eval_rules.py) already documented and
   applied a "prepend **/ to bare-filename patterns" rule; the C#
   analyzers did not, so the two evaluators silently disagreed on
   ~5 patterns across ignorePaths, triggerAllPaths, and sourceToTestMappings.

The fix:

- Rescue now passes config.Categories directly to CategoryMapper so
  CompiledCategory.Matches honors per-category excludes. A file is
  rescued only when at least one category would actually fire on it.
- New PatternNormalization.NormalizeGlob prepends `**/` to any pattern
  without a path separator. Every glob entry point applies it: the four
  analyzers above plus ProjectMappingResolver's regex compiler.
- Rules: integrations.excludePaths gains `tests/Aspire.Acquisition.Tests/**`,
  `tests/Infrastructure.Tests/**`, `**/*.md`, `**/api/*.txt`. The same
  `**/api/*.txt` exclude is added to every category so an ignored ATS
  file can't be rescued back by any category. ignorePaths gains
  `**/api/*.txt`. Acquisition mapping's source list gains the missing
  self-mapping `tests/Aspire.Acquisition.Tests/**` (Templates and
  Infrastructure mappings already had this; Acquisition was an oversight
  exposed only after the new exclude was added).

Verification:

- Audit replay over 161 merged PRs: 4 outcomes change (#17263, #17534,
  #17549, #17672 all move to `skip`); zero regressions; zero
  fallback_unmatched.
- New AuditFixtureTests xUnit [Theory] replays 28 hand-validated PRs
  against the live audit rules. Each row is a separate test, so any
  future rule edit that changes a row's outcome shows up as a visible
  CI failure. Coverage includes templates (#16447), CLI native build
  (#17567), extension multi-category (#17881/17698/17772), Hosting-core
  trigger_all (#17879), polyglot (#17948), and the regression canaries
  for previous fallback_unmatched cases.
- Per-component regression tests pin both bugs: two new tests in
  EndToEndEvaluationTests for rescue+excludes; two more for
  bare-filename matching at nested paths.
- Three pre-existing analyzer tests had asserted the buggy bare-filename
  behavior as expected (e.g. `*.md` not matching `docs/guide.md`).
  Updated with comments explaining the user-intent rule.
- Full TestSelector namespace: 290 tests, all pass. No collateral damage
  on the wider Infrastructure.Tests suite (5 pre-existing baseline
  failures unchanged).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants