JIT: Add a runtime async optimization to skip saving unmutated locals into reused continuations#125615
Conversation
This reverts commit 8e54df1.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR adds a runtime async optimization to skip saving locals that haven't been mutated since the last resumption point when reusing continuation objects. It builds on PR #125556 (which added continuation reuse) by leveraging the knowledge that a reused continuation already holds correct values for unmutated locals, thus eliminating unnecessary write barriers and improving performance by ~10%.
Changes:
- Introduces
PreservedValueAnalysis, a forward dataflow analysis that computes which tracked locals may have been mutated since the previous resumption point, enabling the optimization to skip saving unchanged locals. - Restructures continuation layout handling: replaces the old per-call
ContinuationLayoutwith aContinuationLayoutBuilder/ContinuationLayoutsplit where a shared layout can be computed across all suspension points, and switches flag encoding fromHAS_*bitmasks to index-based encoding of exception/context/result offsets. - Splits
CreateSuspensionandCreateResumptioninto block-creation and IR-population phases, with the newCreateResumptionsAndSuspensionsmethod driving the two-phase approach and handling shared vs per-call layouts.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/inc/corinfo.h | Replaces HAS_* flag bits with index-based encoding for exception, context, and result offsets |
| src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs | Updates managed ContinuationFlags enum and access methods to use new index-based encoding |
| src/coreclr/vm/object.h | Updates GetResultStorage and GetExceptionObjectStorage to use index-based decoding |
| src/coreclr/vm/interpexec.cpp | Updates interpreter suspension/resumption to use index-based flag encoding |
| src/coreclr/interpreter/compiler.cpp | Updates interpreter compiler to emit index-based flag encoding |
| src/coreclr/jit/async.h | Introduces ReturnTypeInfo, ReturnInfo, ContinuationLayoutBuilder, AsyncState, SaveSet; restructures ContinuationLayout and AsyncTransformation |
| src/coreclr/jit/async.cpp | Core implementation: PreservedValueAnalysis, CreateSharedLayout, continuation reuse logic, split save sets |
| src/coreclr/jit/jitconfigvalues.h | Adds JitAsyncReuseContinuations and JitAsyncPreservedValueAnalysisRange config knobs |
| src/coreclr/jit/jitstd/vector.h | Adds const overload of data() to support ContainsLocal const method |
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
|
/azp run runtime-coreclr libraries-jitstress |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
libraries-jitstress failure is #125589 |
|
/azp run Fuzzlyn |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
cc @dotnet/jit-contrib PTAL @AndyAyersMS I have a few follow-ups in mind, but I would like to do them separately:
|
AndyAyersMS
left a comment
There was a problem hiding this comment.
Looks good overall.
Somewhere in here you may want to assert !fgTrysNotContiguous() as paranoia in case somebody like me thinks about reordering some of the Wasm control flow processing with async.
|
/ba-g Timeouts and CI already succeeded on a previous run |
With #125556 we learn something whenever we reuse a continuation -- specifically that the continuation was created at one of the other suspension points that can reach the current suspension point. We can use that knowledge to skip saving all locals that cannot possibly have been mutated since any previous suspension point. This saves a lot of write barriers when we reuse continuations.
Micro benchmark with warmup
(with
DOTNET_TC_OnStackReplacement=0due to #120865) this improves performance by about 10%.Codegen diff