Skip to content

[cdac] Introduce FlushScope and add TargetState scope#128846

Open
max-charlamb wants to merge 2 commits into
dotnet:mainfrom
max-charlamb:cdac-flushscope-api
Open

[cdac] Introduce FlushScope and add TargetState scope#128846
max-charlamb wants to merge 2 commits into
dotnet:mainfrom
max-charlamb:cdac-flushscope-api

Conversation

@max-charlamb
Copy link
Copy Markdown
Member

Note

This PR description was drafted with the help of GitHub Copilot.

Summary

Replaces IContract.Flush() and Target.Flush() with a Flush(FlushScope) overload and introduces a FlushScope enum:

  • All -- clears every cache the target holds, including immutable metadata caches (type layouts, contract instances, ECMA metadata, execution-manager ranges). Matches the historical no-argument Flush behavior.
  • TargetState -- clears only target-state caches (Target.ProcessedData). Contract instances and any metadata caches they own are retained. Each contract is responsible for its own correctness under this scope: contracts that capture a target-memory snapshot at construction must re-read that snapshot in their Flush(FlushScope.TargetState) override.

Motivation

This is the foundation for a follow-up cDAC stress harness change that re-verifies live target state between snapshots (e.g. after every GC allocation under DOTNET_CdacStress=0x1). Today, every such flush re-walks System.Private.CoreLib's ECMA metadata even though that metadata is immutable for the process lifetime (CoreLib is loaded into the non-collectible default ALC). The new FlushScope.TargetState lets ManagedTypeSource_1 retain its CoreLib-only caches across these stress flushes.

Measured impact on a BasicAlloc cDAC-stress run (Release cDAC):

Scope behavior in stress Median (s)
FlushScope.TargetState keeps CoreLib caches (this PR's win) 5.81
All flushes clear everything (today's behavior) 11.83

A 2x speedup on the most allocation-heavy stress test before any further optimization.

Contract updates in this PR

Contract Change
ManagedTypeSource_1 Retains type/typehandle/fielddesc caches across FlushScope.TargetState; clears on FlushScope.All as before. Safe because the contract only resolves names in CoreLib.
ReJIT_1, PlatformMetadata_1 Lazy-snapshot pattern -- no longer captures target state in the constructor; re-snapshots on the next read after any flush.
ExecutionManager_1/_2/ExecutionManagerCore The top RangeSectionMap is resolved lazily and re-resolved after any flush so that FlushScope.TargetState correctly invalidates execution-range data that may have changed across a runtime resume.

EcmaMetadata_1, Signature_1, and RuntimeTypeSystem_1 currently treat both scopes the same and clear on every call -- they can be tuned in follow-up changes if profiling motivates it.

Compatibility

All public Flush() entry points now require a FlushScope argument. Every caller in dotnet/runtime is updated in this PR:

  • SOSDacImpl.FlushCache, ISOSDacInterface13.LockedFlush, IXCLRDataProcess.Flush -> pass FlushScope.All (no behavior change)
  • DacDbiImpl.FlushCache -> FlushScope.All
  • Test harnesses (TestPlaceholderTarget, TestTarget) updated to the new signature

The default no-op override on IContract.Flush(FlushScope) preserves source compatibility for any contract that does not maintain caches.

Validation

  • cdac.slnx builds with 0 warnings, 0 errors.
  • Microsoft.Diagnostics.DataContractReader.Tests: 2417 passed / 0 failed / 16 platform-skipped.

Out of scope (planned follow-ups)

  • The actual cDAC stress harness (cdacstress.{cpp,h}, the DACSTRESSPRIV_REQUEST_FLUSH_TARGET_STATE opcode in IXCLRDataProcess.Request, and the helix pipeline).
  • DataCache two-level dictionary refactor in ContractDescriptorTarget.
  • EcmaMetadata_1._readOnlyMetadataAddress cache.
  • LayoutPair N-source lazy resolution.

/cc @dotnet/runtime-diagnostics-cdac

Replace `IContract.Flush()` and `Target.Flush()` with a `Flush(FlushScope)`
overload, and add a `FlushScope` enum with two values:

* `All`  -- clear every cache the target holds, including immutable metadata
  caches (type layouts, contract instances, ECMA metadata, execution-manager
  ranges). Matches the historical no-argument `Flush` behavior.
* `TargetState` -- clear only target-state caches (`Target.ProcessedData`).
  Contract instances and any metadata caches they own are retained. Each
  contract is responsible for its own correctness under this scope: contracts
  that capture a target-memory snapshot at construction must re-read that
  snapshot in their `Flush(FlushScope.TargetState)` override.

This new scope is the foundation for a follow-up cDAC stress-harness change
that re-verifies live target state between snapshots without paying the cost
of re-walking System.Private.CoreLib ECMA metadata on every iteration. With
`Flush(FlushScope.TargetState)`, `ManagedTypeSource_1` retains its
CoreLib-only caches across flushes (CoreLib lives in the non-collectible
default ALC and its metadata is immutable for the process lifetime).

Contract updates in this change:

* `ManagedTypeSource_1` -- retains type/typehandle/fielddesc caches across
  `FlushScope.TargetState`; clears on `FlushScope.All` as before.
* `ReJIT_1`, `PlatformMetadata_1` -- lazy-snapshot pattern: the contract
  no longer captures target state in its constructor, and re-snapshots on
  the next read after any flush.
* `ExecutionManager_1`/`_2`/`ExecutionManagerCore` -- the top
  `RangeSectionMap` is resolved lazily and re-resolved after any flush so
  that `FlushScope.TargetState` correctly invalidates execution-range data
  that may have changed across a runtime resume.

All other contracts that hold caches (`EcmaMetadata_1`, `Signature_1`,
`RuntimeTypeSystem_1`) currently treat both scopes the same and clear on
every call -- they can be tuned in follow-up changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

The Flush() -> Flush(FlushScope) signature change missed EcmaMetadata_1.
Because IContract.Flush(FlushScope) has a default no-op body, the build
still succeeded but EcmaMetadata's caches silently stopped being flushed
via IContract.Flush(scope) -- the old parameterless Flush() became an
orphaned method, and IContract's no-op default ran in its place.

Match the signature so caches are cleared on flush. Body mirrors the
other Flush(FlushScope) impls on this branch (unconditional clear); a
follow-up audit will decide whether TargetState should preserve any
contract-owned caches.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rcj1
Copy link
Copy Markdown
Contributor

rcj1 commented Jun 1, 2026

Could we skip flushing the SPC metadata in the legacy cases?

@max-charlamb
Copy link
Copy Markdown
Member Author

Could we skip flushing the SPC metadata in the legacy cases?

I think that would be okay, but I'm not confident. I'd like to not do that in this PR. It seems like a very minor perf difference. I'd rather over clear than cause a correctness issue. It was just that this particular case is a very hot path in the stress scenario.

@rcj1
Copy link
Copy Markdown
Contributor

rcj1 commented Jun 1, 2026

Could we skip flushing the SPC metadata in the legacy cases?

I think that would be okay, but I'm not confident. I'd like to not do that in this PR. It seems like a very minor perf difference. I'd rather over clear than cause a correctness issue. It was just that this particular case is a very hot path in the stress scenario.

Personally I'm pretty sure it would be fine as long as we limit ourselves to SPC. I'm fine to do it in a follow up though.

{
foreach (IContract contract in _contracts.Values)
{
contract.Flush();
Copy link
Copy Markdown
Contributor

@rcj1 rcj1 Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since no contract needs the scope except ManagedTypeSource, could we just condition the flush of ManagedTypeSource on the scope here, instead of propagating this through every contract?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want this at the abstractions level so each contract can decide how they want to 'flush' themselves. If any arbitrary contract registers itself (maybe from outside this repo), ideally it should also be able to decide how to flush.

For that reason, I don't want to special case ManagedTypeSource here.

Copy link
Copy Markdown
Contributor

@rcj1 rcj1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe could be optimized by caching SPC whenever it is not negatively cached, but I'm ok if we leave this as is for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants