Skip decommit for large pages and add fake large pages test mode by cshung · Pull Request #127290 · dotnet/runtime

cshung · 2026-04-22T16:52:42Z

With large pages, VirtualDecommit is a no-op since large pages cannot be partially decommitted. PR #126929 fixed the resulting stale data corruption by adding memclr in virtual_decommit, but this approach has downsides: the memory is never returned to the OS, yet we pay for the clearing and produce misleading committed/used bookkeeping.

Instead, skip the decommit entirely for large pages:

distribute_free_regions: skip the aggressive tail-region decommit (the committed-but-unallocated tail of in-use regions). This was the path that caused the heap corruption in GC heap corruption with GCLargePages #126903.
decommit_heap_segment: skip the whole-segment decommit used for segment hoarding and BGC segment deletion. Same class of issue: committed/used are lowered but physical memory retains stale data.
decommit_region: bypass virtual_decommit and call reduce_committed_bytes directly, since decommit_region already handles large pages correctly by clearing memory itself.
virtual_decommit: add an assert that it is never called for heap memory when large pages are on. This catches any future caller that forgets to handle the large pages case. The end_of_data parameter and no-op ternary added by fix for largepages with agressive decommit logic #126929 are removed.

Add GCLargePages=2 mode that simulates large pages using small pages: sets use_large_pages_p=true but reserves with normal pages and commits everything upfront. This exercises all large page GC code paths without requiring OS large page setup or privileges, enabling CI testing.

Fix #126903

dotnet-policy-service · 2026-04-22T16:55:09Z

Tagging subscribers to this area: @JulieLeeMSFT, @dotnet/gc
See info in area-owners.md if you want to be subscribed.

mangod9 · 2026-04-22T18:23:45Z

@janvorli. Thanks @cshung for making the change, I like that we can now force largePages codepath within CI.

VSadov · 2026-04-22T21:37:16Z

The test fails on x86. Perhaps just make the test incompatible with 32bit?

With large pages, VirtualDecommit is a no-op since large pages cannot be partially decommitted. PR dotnet#126929 fixed the resulting stale data corruption by adding memclr in virtual_decommit, but this approach has downsides: the memory is never returned to the OS, yet we pay for the clearing and produce misleading committed/used bookkeeping. Instead, skip the decommit entirely for large pages: 1. distribute_free_regions: skip the aggressive tail-region decommit (the committed-but-unallocated tail of in-use regions). This was the path that caused the heap corruption in dotnet#126903. 2. decommit_heap_segment: skip the whole-segment decommit used for segment hoarding and BGC segment deletion. Same class of issue: committed/used are lowered but physical memory retains stale data. 3. decommit_region: bypass virtual_decommit and call reduce_committed_bytes directly, since decommit_region already handles large pages correctly by clearing memory itself. 4. virtual_decommit: add an assert that it is never called for heap memory when large pages are on. This catches any future caller that forgets to handle the large pages case. The end_of_data parameter and no-op ternary added by dotnet#126929 are removed. Add GCLargePages=2 mode that simulates large pages using small pages: sets use_large_pages_p=true but reserves with normal pages and commits everything upfront. This exercises all large page GC code paths without requiring OS large page setup or privileges, enabling CI testing. Fix dotnet#126903

Address review feedback from mangod9 and janvorli.

Rename large_pages_fake_mode_p to large_pages_emulation_mode_p and update comments to use emulation terminology throughout. Disable test on 32-bit: GCHeapHardLimit=0xC0000000 exceeds the virtual address space and GCLargePages is gated by HOST_64BIT.

janvorli

LGTM, thank you!

Fixes #126903 ## Customer Impact - [x] Customer reported - [ ] Found internally GC heap corruption when `DOTNET_GCLargePages=1` is enabled on Linux (#126903). . Reproducible by calling `GC.Collect(2, GCCollectionMode.Aggressive, true, true)` with large pages enabled, but also occurs in normal production workloads without aggressive GC. ## Regression - [ ] Yes - [x] No This is a pre-existing bug in the GC's large-page decommit logic. When `GCLargePages` is enabled, the GC skips OS-level decommits but still updates bookkeeping as if the decommit succeeded. This causes regions to be reused without being zeroed, leading to heap corruption. The bug has existed since Regions was enabled. ## Testing The fix was validated by the customer against their production workload. ## Risk Low. The fix clears decommitted memory in the large-pages scenario to ensure regions are properly zeroed before reuse. This is a targeted change to the GC's decommit path that only affects `GCLargePages=1` configurations. The larger fix #127290 is made in .NET 11

NativeAOT's `RhConfig::Environment::TryGetIntegerValue` had a hand-rolled hex parser that rejected the `0x`/`0X` prefix — returning a parse error when it encountered `x`. This meant env vars like `DOTNET_GCHeapHardLimit=0xC0000000` silently failed to parse, leaving the hard limit unset. With `GCLargePages=2` also set, the GC would then return `CLR_E_GC_LARGE_PAGE_MISSING_HARD_LIMIT` and fail initialization. CoreCLR's equivalent uses `strtoul(..., 16)` which handles the prefix natively. ## Description - **`src/coreclr/nativeaot/Runtime/RhConfig.cpp`** — In `TryGetIntegerValue`, skip a leading `0x`/`0X` prefix when parsing in hex mode, before entering the digit loop. Additionally, return `false` (parse error) when the value is exactly `"0x"` or `"0X"` with no hex digits following the prefix, matching CoreCLR's `strtoul` behavior: ```cpp uint32_t startIndex = 0; if (!decimal && cchResult >= 2 && buffer[0] == '0' && (buffer[1] == 'x' || buffer[1] == 'X')) { startIndex = 2; if (startIndex == cchResult) return false; // parse error - hex prefix without any digits } for (uint32_t i = startIndex; i < cchResult; i++) ``` This aligns NativeAOT's config parsing with CoreCLR's `strtoul`-based behavior and fixes the `Collect_Aggressive_LargePages` test failure under NativeAOT.  <details> <summary>Original prompt</summary> ## Problem NativeAOT's `RhConfig::Environment::TryGetIntegerValue` in `src/coreclr/nativeaot/Runtime/RhConfig.cpp` uses a hand-rolled hex parser that does not handle the `0x` or `0X` prefix. This causes config values like `DOTNET_GCHeapHardLimit=0xC0000000` to fail to parse, because when the parser encounters the `x` character it returns `false` (parse error). CoreCLR's equivalent code (`CLRConfigNoCache::TryAsInteger` in `src/coreclr/inc/clrconfignocache.h`) uses `strtoul(_value, &endPtr, radix)` which natively handles the `0x` prefix when radix is 16. This causes the test `Collect_Aggressive_LargePages` added in PR #127290 to fail under NativeAOT: the `GCHeapHardLimit` fails to parse, so no hard limit is set, but `GCLargePages=2` succeeds → the GC returns `CLR_E_GC_LARGE_PAGE_MISSING_HARD_LIMIT` and the process exits with -1. ## Fix In `src/coreclr/nativeaot/Runtime/RhConfig.cpp`, in the `TryGetIntegerValue` method, when parsing in hex mode (i.e., `decimal` is false), skip a leading `0x` or `0X` prefix before entering the digit-parsing loop. This matches the behavior of `strtoul` with radix 16 that CoreCLR uses. Specifically, after reading the environment variable into `buffer` and before the parsing loop, add: ```cpp uint32_t startIndex = 0; if (!decimal && cchResult >= 2 && buffer[0] == '0' && (buffer[1] == 'x' || buffer[1] == 'X')) { startIndex = 2; } ``` Then change the loop from `for (uint32_t i = 0; ...)` to `for (uint32_t i = startIndex; ...)`. The following is the prior conversation context from the user's chat exploration (may be truncated): User: ``` 16:22:36.657 Running test: GC\API\GC\Collect_Aggressive_LargePages\Collect_Aggressive_LargePages.cmd Return code: 1 Raw output file: C:\h\w\B1EC0A05\w\B5C209C7\uploads\API\GC\Collect_Aggressive_LargePages\output.txt Raw output: BEGIN EXECUTION call C:\h\w\B1EC0A05\p\nativeaottest.cmd C:\h\w\B1EC0A05\w\B5C209C7\e\GC\API\GC\Collect_Aggressive_LargePages\ Collect_Aggressive_LargePages.dll Expected: 100 Actual: -1 END EXECUTION - FAILED FAILED Test Harness Exitcode is : 1 To run the test: Set up CORE_ROOT and run. > C:\h\w\B1EC0A05\w\B5C209C7\e\GC\GC\native\..\..\API\GC\Collect_Aggressive_LargePages\Collect_Aggressive_LargePages.cmd Xunit.Sdk.TrueException: Return code: 1 Raw output file: C:\h\w\B1EC0A05\w\B5C209C7\uploads\API\GC\Collect_Aggressive_LargePages\output.txt Raw output: BEGIN EXECUTION call C:\h\w\B1EC0A05\p\nativeaottest.cmd C:\h\w\B1EC0A05\w\B5C209C7\e\GC\API\GC\Collect_Aggressive_LargePages\ Collect_Aggressive_LargePages.dll Expected: 100 Actual: -1 END EXECUTION - FAILED FAILED Test Harness Exitcode is : 1 To run the test: Set up CORE_ROOT and run. > C:\h\w\B1EC0A05\w\B5C209C7\e\GC\GC\native\..\..\API\GC\Collect_Aggressive_LargePages\Collect_Aggressive_LargePages.cmd at Xunit.Assert.True(Nullable`1, String) at TestLibrary.OutOfProcessTest.RunOutOfProcessTest(String, String) at Program.<<Main>$>g__TestExecutor26|0_27(StreamWriter, StreamWriter, Program.<>c__DisplayClass0_0&) 16:22:36.755 Failed test: GC\API\GC\Collect_Aggressive_LargePages\Collect_Aggressive_LargePages.cmd ``` the test added here is failing with native AOT. why? Assistant: The test fails on NativeAOT because **NativeAOT's config reader cannot parse the `0x` prefix** in the `DOTNET_GCHeapHardLimit=0xC0000000` environment variable. Looking at NativeAOT's `RhConfig::Environment::TryGetIntegerValue` in: ```c++ name=src/coreclr/nativeaot/Runtime/RhConfig.cpp url=https://github.com/dotnet/runtime/blob/bfaabae213cf02d343846d253b7149a03819680d/src/coreclr/nativeaot/Runtime/RhConfig.cpp#L58-L84 uint64_t uiResult = 0; for (uint32_t i = 0; i < cchResult; i++) { TCHAR ch = buffer[i]; // ... else { uiResult *= 16; if ((ch >= '0') && (ch <= '9')) uiResult += ch - '0'; else if ((ch >= 'a') && (ch <= 'f')) uiResult += (ch - 'a') + 10; else if ((ch >= 'A') && (ch <= 'F')) uiResult += (ch - 'A') + 10; else return false; // parse error <-- 'x' hits this } } ``` When the value is `0xC0000000`, the parser reads hex digits character-by-character. It handles `0`, but then encounters `x`, which isn't a valid hex digit — so it **returns `false`** (parse error). This means `GCHeapHardLimit` is never set. **The cascade:** 1. `DOTNET_GCHeapHardLimit=0xC0000000` fails to parse → no hard limit is set 2. `DOTNET_GCLargePages=2` is read successfully (it's a simple `2`, valid hex) → `use_large_pages_p = true` 3. The GC hits this check in [`interface.cpp`](https://github.com/dotnet/runtime/blob/bfaabae213cf02d343846d253b7149a03819680d/src/coreclr/gc/interface.cpp#L276-L278): `if ((!gc_heap::heap_hard_limit) && gc_heap::use_large_pages_p)` → returns `CLR_E_GC_LARGE_PAGE_MISSING_HARD_LIMIT` 4. GC initialization fails →... </details>  *This pull request was created from Copilot chat.* > --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: MichalStrehovsky <13110571+MichalStrehovsky@users.noreply.github.com> Co-authored-by: Michal Strehovský <MichalStrehovsky@users.noreply.github.com>

github-actions Bot added the area-GC-coreclr label Apr 22, 2026

dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 22, 2026

mangod9 reviewed Apr 22, 2026

View reviewed changes

Comment thread src/coreclr/gc/gc.cpp Outdated

mangod9 reviewed Apr 22, 2026

View reviewed changes

Comment thread src/coreclr/gc/memory.cpp

VSadov reviewed Apr 22, 2026

View reviewed changes

Comment thread src/coreclr/gc/memory.cpp

cshung force-pushed the fix/gc-largepages-skip-tail-decommit branch from 6b4e8d4 to 16345c8 Compare April 23, 2026 04:46

Copilot AI mentioned this pull request Apr 23, 2026

Fix race condition: set _canceled before SignalCore in ProcessWaitState #127312

Merged

build-analysis Bot mentioned this pull request Apr 23, 2026

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

3 tasks

cshung added 3 commits April 24, 2026 10:35

Rename large_pages_fake_mode_p to large_pages_emulation_mode_p

bc4d569

Address review feedback from mangod9 and janvorli.

Address review feedback

80b8740

Rename large_pages_fake_mode_p to large_pages_emulation_mode_p and update comments to use emulation terminology throughout. Disable test on 32-bit: GCHeapHardLimit=0xC0000000 exceeds the virtual address space and GCLargePages is gated by HOST_64BIT.

cshung force-pushed the fix/gc-largepages-skip-tail-decommit branch from a13a490 to 80b8740 Compare April 24, 2026 17:35

jkotas mentioned this pull request Apr 28, 2026

[browser][coreclr] WASM-specific GC OS layer; no mmap/decommit #127328

Merged

pavelsavara reviewed Apr 28, 2026

View reviewed changes

Comment thread src/tests/GC/API/GC/Collect_Aggressive_LargePages.csproj

janvorli approved these changes Apr 28, 2026

View reviewed changes

janvorli merged commit 158bbf4 into dotnet:main Apr 28, 2026
109 of 113 checks passed

BenV mentioned this pull request Apr 28, 2026

Port 126929 to 10.0 #126977

Merged

4 tasks

dotnet-maestro Bot mentioned this pull request Apr 29, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#6338

Merged

cshung deleted the fix/gc-largepages-skip-tail-decommit branch April 29, 2026 16:01

Copilot AI mentioned this pull request May 1, 2026

Fix NativeAOT hex config parser to handle 0x/0X prefix #127644

Merged

BenV mentioned this pull request May 6, 2026

GC heap corruption with GCLargePages #126903

Closed

BenV mentioned this pull request May 6, 2026

GC heap corruption with GCLargePages (Part 2: no aggressive GC required) #127892

Closed

janvorli mentioned this pull request May 19, 2026

[release/10.0] Fix heap_segment_used watermark after compaction #128342

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip decommit for large pages and add fake large pages test mode#127290

Skip decommit for large pages and add fake large pages test mode#127290
janvorli merged 3 commits into
dotnet:mainfrom
cshung:fix/gc-largepages-skip-tail-decommit

cshung commented Apr 22, 2026

Uh oh!

dotnet-policy-service Bot commented Apr 22, 2026

Uh oh!

mangod9 commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VSadov commented Apr 22, 2026

Uh oh!

Uh oh!

janvorli left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

cshung commented Apr 22, 2026

Uh oh!

dotnet-policy-service Bot commented Apr 22, 2026

Uh oh!

mangod9 commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VSadov commented Apr 22, 2026

Uh oh!

Uh oh!

janvorli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants