Skip to content

AArch64 DWARF unwind + macOS os_signpost integration#176

Open
angerman wants to merge 11 commits into
stable-ghc-9.14from
feat/aarch64-dwarf-signpost
Open

AArch64 DWARF unwind + macOS os_signpost integration#176
angerman wants to merge 11 commits into
stable-ghc-9.14from
feat/aarch64-dwarf-signpost

Conversation

@angerman
Copy link
Copy Markdown

Summary

Two independent, complementary features that make GHC-compiled Haskell code visible to standard profiling tools on macOS:

Part 1: AArch64 DWARF Unwind Support (GHC #19913)

The AArch64 NCG was silently discarding CmmUnwind nodes (return nilOL), producing no DWARF unwind information. This made lldb bt, Instruments, and Samply unable to unwind through Haskell frames on Apple Silicon.

  • Add UNWIND pseudo-instruction to AArch64 Instr data type (mirrors X86)
  • Implement CmmUnwindUNWIND conversion in stmtToInstrs
  • Add addSpUnwindings to emit UNWIND after DELTA (tracks SP changes)
  • Add extractUnwindPoints and wire into NcgImpl (was const [])
  • Add UNWIND pretty-printing to Ppr.hs

Part 2: macOS os_signpost Integration

The RTS had no os_signpost support, making GC pauses, thread events, and user events invisible in Apple Instruments.

  • New rts/Signpost.{h,c} with os_signpost API wrappers
  • GC interval signposts (begin/end pairs, per-capability tracking)
  • Thread lifecycle signposts (create/run/stop)
  • User event forwarding (traceEvent#/traceMarker#)
  • Zero overhead when Instruments is not attached (os_signpost_enabled() gate)
  • Empty macros on non-Darwin (zero overhead)
  • Events appear in Instruments "Points of Interest" lane by default

Files Changed

Compiler (AArch64 DWARF):

  • compiler/GHC/CmmToAsm/AArch64/Instr.hs — UNWIND constructor + pattern matches
  • compiler/GHC/CmmToAsm/AArch64/CodeGen.hs — CmmUnwind handler, addSpUnwindings, extractUnwindPoints
  • compiler/GHC/CmmToAsm/AArch64/Ppr.hs — UNWIND pretty-printing
  • compiler/GHC/CmmToAsm/AArch64.hs — Wire extractUnwindPoints into NcgImpl

RTS (os_signpost):

  • rts/Signpost.h — Header with Darwin functions / non-Darwin empty macros
  • rts/Signpost.c — os_signpost implementation
  • rts/RtsStartup.c — initSignposts/freeSignposts lifecycle
  • rts/Stats.c — GC begin/end signpost calls
  • rts/Trace.h — Thread event signpost calls
  • rts/Trace.c — User event signpost calls
  • rts/rts.cabal — Add Signpost.c to build

Test plan

  • Build GHC with changes on AArch64 (Apple Silicon)
  • Compile test program with -g and verify dwarfdump --debug-frame shows FDE entries
  • Run under lldb and verify bt shows Haskell frames
  • Build GHC on macOS and run allocation-heavy program under Instruments
  • Verify GC intervals appear in "Points of Interest" lane
  • Verify traceEvent/traceMarker events appear as signpost events
  • Verify non-Darwin builds compile without warnings (empty macros)

The AArch64 native code generator was silently discarding CmmUnwind
nodes (`return nilOL`), making DWARF-based profiling and debugging
impossible on Apple Silicon and ARM64 Linux.

This commit adds full DWARF unwind support, mirroring the existing
X86 implementation:

  - Add UNWIND pseudo-instruction to AArch64.Instr
  - Convert CmmUnwind nodes to UNWIND instructions in CodeGen
  - Emit UNWIND after DELTA via addSpUnwindings for SP tracking
  - Wire extractUnwindPoints into the AArch64 NcgImpl record
  - Pretty-print UNWIND as a label + comment in Ppr

With this change, `ghc -g` on AArch64 produces .debug_frame entries,
enabling `lldb` backtraces, `dwarfdump --debug-frame`, and sampler-
based profilers (Instruments, Samply) to unwind through Haskell code.
GHC-compiled programs are invisible to Apple Instruments because the
RTS emits no os_signpost events. This makes it hard to correlate GC
pauses and thread scheduling with system-level activity on macOS.

Add a new Signpost.c/Signpost.h module that bridges RTS events to the
os_signpost API, using OS_LOG_CATEGORY_POINTS_OF_INTEREST so events
appear in Instruments by default without a custom .instrpkg:

  - GC intervals: begin/end pairs tracked per-capability with unique
    signpost IDs, emitting generation, bytes copied, and slop
  - Thread lifecycle: create/run/stop as point events with cap and tid
  - User events: traceEvent#/traceMarker# forwarded as signposts

All functions gate on os_signpost_enabled() so the overhead when
Instruments is not attached is near zero (a single branch on the
log handle's signpost-enabled flag).

On non-Darwin platforms, all functions compile to empty macros.

Integration points:
  - Stats.c: GC begin/end with full statistics
  - Trace.h/Trace.c: thread and user event forwarding
  - RtsStartup.c: init after initScheduler, free before endTracing
@angerman angerman force-pushed the feat/aarch64-dwarf-signpost branch from b203c9e to eeb6c01 Compare March 12, 2026 05:27
The AArch64 NCG was explicitly excluded from DWARF debug info
generation despite having full UNWIND pseudo-instruction support.
This meant all computed unwind data was silently discarded.

Changes:

- Remove ArchAArch64 exclusion from ncgDwarfEnabled, enabling
  .debug_frame output on AArch64 ELF (Linux)

- Define REG_MachSp as r31 in arm64.h so the C stack pointer maps
  to DWARF register 31 (SP) instead of 0 (x0). Without this, all
  addSpUnwindings output incorrectly described x0 changes.

- Add DW_CFA_same_value for AArch64 SP (register 31) in the CIE
  initial instructions. This prevents the DWARF unwinder from
  incorrectly setting SP = CFA (which is the STG Sp on x20).

- Fix UNWIND Note reference to point to GHC.CmmToAsm (where the
  Note actually lives) instead of X86/Instr.hs.
Add basic block structure validation to the AArch64 code generator,
mirroring the existing X86 implementation. This catches NCG bugs
where non-control-flow instructions appear after block-terminating
jumps, which would violate the basic block invariant.

BL (Branch and Link) is exempted from the block-end check since
it is a call that returns to the caller, not a block terminator.
Only active when debugIsOn (debug builds).
DWARF generation was gated on osElfTarget, excluding all MachO
targets despite the DWARF assembly output code already handling
MachO section directives (__DWARF,__debug_*), darwin-specific
alignment (.align as log2), and section offsets.

Changes:

- Replace osElfTarget gate with osDwarfTarget that accepts both
  ELF and MachO, enabling -g on macOS/darwin

- Make DWARF section labels (dwarfInfoLabel, etc.) platform-aware:
  use "L" prefix on darwin (MachO convention) instead of hardcoded
  ".L" (ELF convention), via asmTempLabelPrefix

With this change, ghc -g on AArch64-darwin produces .debug_frame
entries visible to dsymutil, lldb, and Instruments.
Add DWARF Call Frame Information directives to the AArch64 StgRun
function so debuggers and profilers can unwind through the Haskell↔C
boundary. Without CFI, tools like lldb, gdb, and perf cannot produce
backtraces that cross from Haskell into C code.

The CFA is anchored at x29+16 (the frame pointer saved by the first
stp), and all callee-saved registers (x19-x28, x16-x17, x29-x30,
d8-d15) are annotated with their save locations on the C stack.

Enable ENABLE_UNWINDING on AArch64-darwin in addition to Linux.
The original restriction (#15207) was about x86_64 GCC/Clang
assembler incompatibilities that do not apply to AArch64 where
both Linux and darwin use Clang-compatible assemblers.
asmTempLabelPrefix is not exported from GHC.Cmm.CLabel. Use a local
dwarfLocalLabel helper that implements the same logic: "L" on darwin,
".L" on ELF targets.
The assembler was reporting 'local symbol LcXX_proc_end not defined'
because AArch64/Ppr.hs never emitted _proc_end labels that DWARF
.debug_info and .debug_frame reference for procedure address ranges.

Add pprProcEndLabel and pprBlockEndLabel helpers (matching the X86
pattern) and emit them:
- At the end of each basic block (since blocks may become standalone
  top-level blocks after branch-chain elimination)
- At the end of each procedure in pprNatCmmDecl (both with and
  without info tables)

This fixes 14 test failures on aarch64-darwin with DWARF enabled.
The MachO assembler cannot handle relocations against local symbols
(L-prefixed labels on darwin) in DWARF debug sections, producing:

    error: unsupported relocation of local symbol 'Lc134_die'.
    Must have non-local symbol earlier in section.

This is a fundamental MachO assembler limitation that requires either
non-local DWARF labels or section anchor symbols to resolve.

Revert to ELF-only DWARF debug sections for now. The MachO-related
infrastructure (section directives, local label prefix support, CFI
directives in StgRun) is kept in place for future MachO DWARF work.

CFI directives (.cfi_*) in StgCRun.c remain enabled on AArch64-darwin
as they produce .eh_frame entries that the system tools handle fine.
On MachO, the assembler cannot create relocations against temporary
symbols (L-prefixed) in DWARF debug sections unless there is a
non-temporary symbol earlier in the section to serve as the relocation
base. Without such an anchor, the assembler fails with:

  error: unsupported relocation of local symbol 'Lfoo'.
  Must have non-local symbol earlier in section.

This was preventing DWARF debug info (-g) from working on macOS/darwin.

The fix emits a linker-private anchor symbol (l_ prefix) at the start
of each DWARF section (.debug_info, .debug_abbrev, .debug_line,
.debug_frame, .debug_aranges). The l_ prefix gives us:
  - A symbol table entry (assembler can create relocations against it)
  - Local binding (no duplicate symbol errors across compilation units)

Also fixes the label ordering in .debug_info where the section label
was emitted BEFORE the section directive (placing it in the wrong
section).

With these anchors in place, DWARF is re-enabled on MachO targets.

Additional changes:
- Gate ncgComputeUnwinding to DWARF-capable targets (ELF + MachO)
  to avoid wasting work on platforms that cannot emit debug info
- Document info table alignment decision in AArch64/Ppr.hs
Add signpostsAddCapabilities() to resize per-capability signpost ID
arrays when setNumCapabilities grows the number of capabilities at
runtime. Without this, new capabilities' GC intervals would not be
tracked in Instruments (graceful degradation via bounds check, but
data loss).

Follows the pattern of tracingAddCapabilities() and
storageAddCapabilities() in Schedule.c.

Also use pprBlockEndLabel helper consistently in AArch64/Ppr.hs
instead of manually constructing the label.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant