Skip to content

Add ElfSymbolModule for Parsing ELF Symbol Tables#2384

Merged
brianrob merged 3 commits into
microsoft:mainfrom
brianrob:brianrob/elf-parser
Mar 20, 2026
Merged

Add ElfSymbolModule for Parsing ELF Symbol Tables#2384
brianrob merged 3 commits into
microsoft:mainfrom
brianrob:brianrob/elf-parser

Conversation

@brianrob
Copy link
Copy Markdown
Member

@brianrob brianrob commented Mar 19, 2026

Summary

Adds ElfSymbolModule, an ELF (Executable and Linkable Format) symbol parser that resolves RVAs to symbol names. This enables PerfView and TraceEvent to resolve native symbols from Linux ELF binaries when analyzing universal traces captured on Linux.

Key capabilities

  • Parses both .symtab and .dynsym sections
  • Supports 32-bit and 64-bit ELF, little-endian and big-endian
  • Implements ISymbolLookup for integration with TraceEvent's symbol resolution pipeline
  • Demangles Itanium C++ (_Z) and Rust v0 (_R) mangled names using demanglers from Implement Symbol Demanglers for Linux Binaries #2383.

Design

  • Lazy name resolution: Symbol names are decoded from the string table on first FindNameForRva hit, not during construction. Most symbols in a trace are never looked up — this avoids unnecessary work.
  • LOH-free strtab: String table data retained in a SegmentedList<byte> with 64KB segments.
  • Pre-allocated structures: Two-pass section header scan measures sizes before allocating — zero resizes.
  • BitConverter fast path: 64-bit little-endian symbol entries parsed directly from byte arrays.
  • O(log n) lookups: Sorted array with binary search.

Performance

To give a sense of existing vs. new performance of symbol parsing and lookup.

ELF vs PDB comparison (CoreCLR, 18K symbols, net8.0)

ELF (ElfSymbolModule) PDB (NativeSymbolModule / DIA)
Parse 2.3 ms / 3.0 MB 7.4 ms / 9.6 KB
Lookup (cold) 29 ns / 0 B 8,745 ns / 240 B

ELF parses 3.2x faster than PDB and first lookups are 303x faster (sorted array binary search vs COM interop). PDB's lower parse memory is due to DIA's lazy/deferred loading model.

Lookup benchmarks (zero-alloc)

Symbols Lookup Time
5 7.6 ns
256 15.3 ns
10,772 21.7 ns
18,030 29.0 ns

Testing

Unit tests (26 tests)

Synthetic ELF binaries generated by ElfBuilder (no checked-in test binaries):

  • Error handling: invalid magic, truncated, empty, bad ELF class, no section headers
  • Format coverage: 64-bit LE, 32-bit LE, 64-bit BE, 32-bit BE
  • Symbol filtering: non-function types, zero-value, zero-size, below PT_LOAD base
  • RVA adjustment with pVaddr and pOffset
  • Demangling integration: Itanium C++, Rust v0, plain passthrough
  • .dynsym section parsing alongside .symtab
  • Edge cases: empty table, RVA zero, 100-symbol binary search stress test
  • File path constructor validation

Offline validation against pyelftools (568 real ELF files, 109,293 symbols)

Validated the parser against a known-good reference:

  1. A Python script using pyelftools to extract the ground truth from 568 real .debug files: the executable PT_LOAD segment parameters and all STT_FUNC symbols (st_value, st_size, raw mangled name).
  2. The C# test constructs an ElfSymbolModule with demangling disabled for each file, then verifies that every reference symbol is found at the correct RVA with an exact match on start address and raw name.
  3. Multiple symbols sharing the same start RVA (GCC destructor D1/D2 aliases, LTO clones, ICF-merged functions) are handled by accepting any matching name at that address.

Result: zero mismatches across all 568 files and 109,293 symbols.

Contributes to #2382.

@brianrob brianrob marked this pull request as ready for review March 19, 2026 18:37
@brianrob brianrob requested a review from a team as a code owner March 19, 2026 18:37
brianrob and others added 3 commits March 19, 2026 11:56
Add ElfSymbolModule, which reads ELF (Executable and Linkable Format)
files and resolves RVAs to symbol names. This enables PerfView and
TraceEvent to resolve native symbols from Linux ELF binaries when
analyzing universal traces captured on Linux.

Key capabilities:
- Parses both .symtab and .dynsym sections
- Supports 32-bit and 64-bit ELF, little-endian and big-endian
- Implements ISymbolLookup for integration with TraceEvent's symbol
  resolution pipeline
- Demangles Itanium C++ (_Z) and Rust v0 (_R) mangled names

Design for performance:
- Lazy name resolution: symbol names decoded on first FindNameForRva
  hit, not during construction
- LOH-free strtab via SegmentedList<byte> with 64KB segments
- Two-pass section scan pre-allocates all structures with zero resizes
- BitConverter fast path for 64-bit LE symbol entries
- O(log n) zero-allocation lookups via sorted array binary search

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add comprehensive test suite with 26 tests covering all ElfSymbolModule
code paths:
- Error handling: invalid magic, truncated, empty, bad class, no sections
- Format coverage: 64-bit LE, 32-bit LE, 64-bit BE, 32-bit BE
- Symbol filtering: non-function types, zero-value, zero-size, below PT_LOAD
- RVA adjustment with pVaddr and pOffset
- Demangling integration: Itanium C++, Rust v0, plain passthrough
- .dynsym section parsing alongside .symtab
- Edge cases: empty table, RVA zero, 100-symbol binary search stress
- File path constructor validation

ElfBuilder is a test helper that constructs synthetic minimal ELF
binaries in memory, avoiding the need for checked-in test binaries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add BenchmarkDotNet benchmarks for ELF symbol parsing and lookup using
ElfBuilder-generated synthetic ELF binaries (5, 256, and 10000 symbols).
No external file dependencies.

Parse benchmarks measure construction time and memory allocation.
Lookup benchmarks measure FindNameForRva performance (zero-alloc).

Link ElfBuilder.cs from the test project to share the ELF binary
builder without duplicating code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@brianrob brianrob force-pushed the brianrob/elf-parser branch from 3328404 to e5cf6d2 Compare March 19, 2026 18:56
@brianrob brianrob merged commit bcc0670 into microsoft:main Mar 20, 2026
5 checks passed
@brianrob brianrob deleted the brianrob/elf-parser branch March 20, 2026 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants