[release/10.0] Port dump collection perf improvements#128023
Merged
Conversation
Replaces the hand-rolled hash table implementation in DacInstanceManager withan `SHash` based implementation. This hash is more efficient for the high volume of find operations the DAC issues when verifying cached cross-process reads. During mini dump collection, DacInstanceManager is the central cache for all memory read from the target process. The hand-rolled hash table used a fixed bucket array that degraded quickly for Find and insertion operations. Measured minidump collection against a repro app with 2.5k frame deep stacks over 50 threads and the speedup was roughly 9.5x.
…anning (dotnet#125459) Second partial fix for dotnet#122459 Caches the list of debugger breakpoint patches in the DAC so that x64 stack unwinding doesn't re-scan the patch hash table on every frame. During mini dump collection, each stack frame triggers DacReplacePatchesInHostMemory to restore original opcodes before reading memory — even though there are typically zero active patches during a dump. The patch hash table has 1,000 fixed buckets, so each call walked all of them regardless. The cache is populated once on first access and invalidated only on Flush(). Measured minidump collection against the same repro app with 10,000 iterations across 10 threads. The baseline was 55s, this change alone brings it to ~7s
Contributor
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
Contributor
There was a problem hiding this comment.
Pull request overview
Backport of CoreCLR DAC minidump-collection performance improvements to release/10.0, focused on reducing overhead during heap dump enumeration and x64 stack unwinding, plus an opt-in switch to use the HEAP2 enumeration path for faster heap dumps.
Changes:
- Replace the DAC instance cache’s prior map implementation with an
SHash-based table to improve lookup/insert scalability during dump generation. - Add a DAC-side patch cache so x64 unwinding doesn’t repeatedly rescan the debugger patch table on every frame.
- Introduce
DOTNET_EnableFastHeapDumps(viaEXTERNAL_EnableFastHeapDumps) to let the target process opt into promoting HEAP dumps to HEAP2 inside the DAC.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/vars.hpp | Declares a new global (g_EnableFastHeapDumps) for communicating the opt-in to the DAC. |
| src/coreclr/vm/vars.cpp | Defines/initializes g_EnableFastHeapDumps. |
| src/coreclr/vm/ceemain.cpp | Reads EXTERNAL_EnableFastHeapDumps on startup and stores it in g_EnableFastHeapDumps. |
| src/coreclr/inc/dacvars.h | Exposes g_EnableFastHeapDumps to the DAC via DEFINE_DACVAR. |
| src/coreclr/inc/clrconfigvalues.h | Adds the EXTERNAL_EnableFastHeapDumps config value (env var surface). |
| src/coreclr/debug/daccess/enummem.cpp | Promotes HEAP → HEAP2 when g_EnableFastHeapDumps != 0 during heap dump enumeration. |
| src/coreclr/debug/daccess/dacimpl.h | Switches the instance cache to SHash traits and introduces the DacPatchCache type/member. |
| src/coreclr/debug/daccess/dacfn.cpp | Uses the patch cache in DacReplacePatchesInHostMemory and implements cache population. |
| src/coreclr/debug/daccess/daccess.cpp | Updates instance cache operations to SHash APIs and flushes the new patch cache on DAC Flush(). |
noahfalk
approved these changes
May 13, 2026
Contributor
|
Guess it should be assigned the upcoming milestone 10.0.9 ? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #122459
main PRs:
Description
Backports three DAC performance improvements for minidump collection:
Use SHash as DAC instance hash (Use SHash as DAC instance hash #125631): Replaces the hand-rolled hash table in
DacInstanceManagerwith anSHash-based implementation. The previous fixed-bucket hash degraded quickly for Find and insertion operations under high load. Measured ~9.5x speedup for minidump collection against a repro app with 2.5k-frame deep stacks over 50 threads.Cache debugger patches (Cache debugger patches to speed up x64 stackwalk epilogue/prologue scanning #125459): Caches the list of debugger breakpoint patches so that x64 stack unwinding doesn't re-scan the 1,000-bucket patch hash table on every frame. The cache is populated once on first access and invalidated on
Flush(). Measured reduction from 55s to ~7s for minidump collection (10,000 iterations across 10 threads).Enable CLRDATA_ENUM_MEM_HEAP2 via environment variable: When the target process has
DOTNET_EnableFastHeapDumpsset, the DAC promotesCLRDATA_ENUM_MEM_HEAPtoCLRDATA_ENUM_MEM_HEAP2, which dumps loader heap pages in bulk instead of walking individual runtime structures.Customer Impact
Customers collecting minidumps of large .NET applications (many threads, deep stacks) experience extremely slow dump collection times - on the order of minutes for what should take seconds. This directly impacts incident response time in production environments. Without these fixes, dump collection through Watson/dotnet-dump/createdump remains unacceptably slow for large workloads.
Regression
Yes, with respect to framework. Customers doing migrations have noticed them - framework used non-portable variants of the MSVC library.
Testing
DOTNET_EnableFastHeapDumps), so no change in default behavior. This is the riskier change since it makes heap dump match our expectations but might yield unknown!unknown if the modules aren't indexed properly.Risk
Low.
daccess.cpp,dacfn.cpp,dacimpl.h) that execute only during diagnostic operations (dump collection, debugging). They do not affect runtime execution.DOTNET_EnableFastHeapDumpsenv var is opt-in and does not change default behavior.