JIT: generalize cloning conditions slightly by AndyAyersMS · Pull Request #128532 · dotnet/runtime

AndyAyersMS · 2026-05-24T02:11:39Z

Generalize loop cloning a bit:

allow more statements between increment and test
allow cloning conditions to establish a zero trip test if one is missing.

optExtractTestIncr previously required the IV increment to be the immediate predecessor statement of the loop test (after an optional BBINSTR profile-counter store). This rejected loops where unrelated statements happened to sit between the increment and the test, even though the increment-and-test pair was otherwise well-formed. Relax the check to scan backward through the exit block's statements for the first IV-shaped update (the shape recognized by optIsLoopIncrTree). Picking the wrong candidate is safe: the caller FlowGraphNaturalLoop::AnalyzeIteration verifies via MatchLimit that the test actually uses the picked iterVar, and via VisitDefs that there are no other defs of iterVar anywhere in the loop, so any incorrect pick is rejected and the loop simply fails IV analysis as before. This recovers some loop cloning that was previously blocked by incidental adjacency failures (e.g. an unrelated store sneaking in between the increment and the test in the exit block). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Some loops are rejected by FlowGraphNaturalLoop::CheckLoopConditionBaseCase because their loop test cannot be proved true on the first iteration, and no BBJ_COND zero-trip guard exists on the unique-pred chain in front of the preheader. Previously, unconditional loop inversion happened to supply such a guard as a side effect; now that inversion is being made more selective, fewer loops have it. Teach loop cloning to emit its own runtime zero-trip guard as one of the fast-path entry conditions. AnalyzeIteration gains an allowMissingBaseCase flag (off by default to preserve unroller/inversion semantics); when set, it accepts loops with concrete enough init/limit info and marks NaturalLoopIterInfo::NeedsZeroTripGuard. optDeriveLoopCloningConditions then pushes an additional 'init TestOper limit' condition (handling const, invariant-local, and array-length limits) onto the loop's cloning conditions, so the fast path runs only when the loop would actually iterate at least once. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When a loop has no statically-known init but has a recognizable IV with a const, invariant-local, or array-length limit, emit the runtime zero-trip guard using the IterVar local read at the preheader as the init expression. AnalyzeIteration has already verified the IV is not address-exposed and has no extraneous defs inside the loop, so the preheader read yields the value the IV will have on first iteration. This recovers most of the residual clone loss from the inversion-skip: on osx-arm64 vs upstream main, LoopsCloned now improves by +164 (up from +59 before this commit). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dotnet-policy-service · 2026-05-24T02:12:48Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR broadens the JIT’s loop-iteration analysis and loop-cloning eligibility by (1) relaxing how the IV increment is located in an exiting block and (2) allowing AnalyzeIteration to succeed even when it can’t prove the loop executes at least once, by deferring that requirement to loop cloning via an explicit runtime “zero-trip” guard condition.

Changes:

Update optExtractTestIncr to search backward for an IV-shaped increment instead of requiring it to be immediately adjacent to the loop test.
Extend FlowGraphNaturalLoop::AnalyzeIteration with an allowMissingBaseCase mode that sets NeedsZeroTripGuard when the base-case can’t be proven.
In loop cloning, when NeedsZeroTripGuard is set, emit an extra cloning condition guarding the fast path against zero-trip execution.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/coreclr/jit/optimizer.cpp`	Changes IV increment extraction to scan backward from the loop test.
`src/coreclr/jit/loopcloning.cpp`	Adds emission of a zero-trip guard cloning condition and enables missing-base-case iteration analysis for cloning.
`src/coreclr/jit/flowgraph.cpp`	Adds `allowMissingBaseCase` parameter and sets `NeedsZeroTripGuard` when base-case can’t be proven but a symbolic guard is expressible.
`src/coreclr/jit/compiler.h`	Extends `NaturalLoopIterInfo` and updates `AnalyzeIteration` signature/defaults.

- optExtractTestIncr: after relaxing the adjacency requirement, validate that no statement strictly between the chosen IV increment and the loop test references the iterator variable. This restores the AnalyzeIteration invariant that no loop-body IR observes the post-increment value of the iterator (except the test itself), which loop cloning and other consumers rely on. - AnalyzeIteration: update the Remarks block to document the relaxed adjacency rule, the new allowMissingBaseCase parameter, and the caller's obligation to emit a runtime zero-trip guard when NeedsZeroTripGuard is set. SPMI (osx-arm64) shows the use-check costs only ~9 of ~95 recovered clones across the major collections; net cloning gain is preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Statement::LocalsTreeList asserts NodeThreading::AllLocals, which is not the threading mode in effect when loop cloning runs (asserts fired in checked builds, e.g. during 'Clone loops' on System.Decimal:FromOACurrency). Replace the iteration over LocalsTreeList with an explicit fgWalkTreePre walk (lclVarsOnly), which works in any threading mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- optExtractTestIncr: replace the custom fgWalkTreePre + lambda with gtTreeHasLocalRead / gtTreeHasLocalStore. Avoids duplicating walker logic and the per-iteration lambda allocation. - optDeriveLoopCloningConditions: when both NeedsZeroTripGuard and HasArrayLengthLimit are set, hoist the ArrIndex allocation and the array deref entry out so the zero-trip guard and the regular limit conditions share a single ArrIndex/deref rather than allocating and pushing them twice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

AnalyzeIteration's VisitDefs already rejects any def of iterVar in the loop other than the picked increment, so a statement between incr and test cannot store to iterVar. Only a read check is needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

gtTreeHasLocalRead only looks at direct local references, so it cannot see intervening accesses to an address-exposed iterator through indirection. AnalyzeIteration ultimately rejects address-exposed IVs anyway, so reject here and let the caller try the next candidate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

AndyAyersMS · 2026-05-25T00:30:00Z

Diffs

More cloning (unsurprisingly). Should mitigate most of the cloning losses seen in #128459

@jakobbotsch PTAL
fyi @dotnet/jit-contrib

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

In response to PR feedback: the backward scan in optExtractTestIncr picked the first IV-shaped statement and bailed entirely if it failed later validation (address-exposed iterVar, test does not read iterVar, or intervening read of iterVar). Restructure the scan so that each candidate is fully validated inline and rejection simply continues the scan, only failing when no candidate in the block qualifies. Also reset NeedsZeroTripGuard at AnalyzeIteration entry so a stale value cannot leak if the iter-info struct is reused across loops. SPMI on osx-arm64: +127 LoopsCloned vs upstream main (aspnet2 +7, libraries.pmi +42, benchmarks.run +28, libraries.crossgen2 +36, realworld.run +14). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

AndyAyersMS · 2026-05-26T14:40:15Z

Latest diffs

tannergooding · 2026-05-26T18:15:55Z

We have a few diffs like this:

-G_M56737_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=000C {rdx rbx}, byref, isz
-       movzx    r8, word  ptr [rbx+2*rcx]
+G_M56737_IG06:        ; bbWeight=3.96, gcrefRegs=0000 {}, byrefRegs=000C {rdx rbx}, byref, isz
+       mov      r8d, ecx
+       movzx    r8, word  ptr [rbx+2*r8]

I don't think its blocking or anything, but it might be interesting to look at specifically since we can presumably still consume rcx directly here (nothing else mutates it), so possibly just a minor missing cast or containment opportunity

Seems to be a few instances and coule reduce the regression seen by the diffs.

jakobbotsch

LGTM, couple of paranoia questions

In response to PR feedback: * Add a small budget (100) decremented across the outer scan and any inner intervening-read scan, to prevent pathological O(N^2) behavior in test blocks with many IV-shaped statements. * If the test block is in a try region, treat any intervening statement that may throw (GTF_EXCEPT) as if it were an intervening read of the iterator. An EH handler could otherwise observe the post-increment value, violating AnalyzeIteration's invariant. SPMI on osx-arm64 unchanged: still +127 LoopsCloned vs upstream main. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

In response to PR feedback. The condition that triggers NeedsZeroTripGuard is that the loop condition [IterVar TestOper Limit] cannot be proven to hold on entry, not that the loop body cannot be proven to execute at least once. A do/while-style loop is guaranteed to execute at least once but may still need a guard if its condition is not provably true on entry. Updated wording in compiler.h, flowgraph.cpp, and loopcloning.cpp to reflect the actual contract. No code changes; comments only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

AndyAyersMS · 2026-05-26T19:59:20Z

We have a few diffs like this:
-G_M56737_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=000C {rdx rbx}, byref, isz
-       movzx    r8, word  ptr [rbx+2*rcx]
+G_M56737_IG06:        ; bbWeight=3.96, gcrefRegs=0000 {}, byrefRegs=000C {rdx rbx}, byref, isz
+       mov      r8d, ecx
+       movzx    r8, word  ptr [rbx+2*r8]
I don't think its blocking or anything, but it might be interesting to look at specifically since we can presumably still consume rcx directly here (nothing else mutates it), so possibly just a minor missing cast or containment opportunity

Seems to be a few instances and coule reduce the regression seen by the diffs.

We are apparently running into cases where IV opts, cloning, and LSRA are not playing together nicely:

runtime/src/coreclr/jit/inductionvariableopts.cpp

Lines 531 to 538 in 8e9aa2a

    
           if (stmt->GetRootNode()->AsLclVarCommon()->GetLclNum() == lclNum) 
        
           { 
        
               JITDUMP("  V%02u has a phi [%06u] in " FMT_LP "'s header " FMT_BB "\n", lclNum, 
        
                       dspTreeID(stmt->GetRootNode()), otherLoop->GetIndex(), otherLoop->GetHeader()->bbNum); 
        
               // TODO-CQ: We can legally widen these cases, but LSRA is 
        
               // unhappy about some of the lifetimes we create when we do 
        
               // this. This particularly affects cloned loops. 
        
               return false;

This came from the initial SCEV work (#97865) and I don't think @jakobbotsch ever opened a follow-up issue.

AndyAyersMS and others added 3 commits May 23, 2026 18:03

Copilot AI review requested due to automatic review settings May 24, 2026 02:11

Copilot started reviewing on behalf of AndyAyersMS May 24, 2026 02:11 View session

github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 24, 2026

dotnet-policy-service Bot assigned AndyAyersMS May 24, 2026

Copilot AI reviewed May 24, 2026

View reviewed changes

Comment thread src/coreclr/jit/optimizer.cpp Outdated

Comment thread src/coreclr/jit/flowgraph.cpp

Copilot AI review requested due to automatic review settings May 24, 2026 16:06

Copilot started reviewing on behalf of AndyAyersMS May 24, 2026 16:06 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

Comment thread src/coreclr/jit/optimizer.cpp Outdated

Comment thread src/coreclr/jit/loopcloning.cpp

AndyAyersMS force-pushed the CloneZeroTripGuard branch from 652c84f to b4fd5f7 Compare May 24, 2026 16:16

Copilot AI review requested due to automatic review settings May 24, 2026 16:26

Copilot started reviewing on behalf of AndyAyersMS May 24, 2026 16:26 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

Comment thread src/coreclr/jit/optimizer.cpp Outdated

build-analysis Bot mentioned this pull request May 24, 2026

Unable to pull image from mcr.microsoft.com #117164

Open

AndyAyersMS marked this pull request as ready for review May 25, 2026 00:29

Copilot AI review requested due to automatic review settings May 25, 2026 00:29

Copilot started reviewing on behalf of AndyAyersMS May 25, 2026 00:30 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread src/coreclr/jit/optimizer.cpp

Comment thread src/coreclr/jit/flowgraph.cpp

build-analysis Bot mentioned this pull request May 25, 2026

XHarness package install failure on iOS due to devicectl NSPOSIXErrorDomain error 49 #123796

Open

AndyAyersMS requested a review from jakobbotsch May 26, 2026 18:00

jakobbotsch reviewed May 26, 2026

View reviewed changes

Comment thread src/coreclr/jit/optimizer.cpp

jakobbotsch reviewed May 26, 2026

View reviewed changes

Comment thread src/coreclr/jit/optimizer.cpp Outdated

jakobbotsch reviewed May 26, 2026

View reviewed changes

jakobbotsch approved these changes May 26, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 26, 2026 18:35

Copilot started reviewing on behalf of AndyAyersMS May 26, 2026 18:36 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread src/coreclr/jit/compiler.h Outdated

Comment thread src/coreclr/jit/flowgraph.cpp

Comment thread src/coreclr/jit/loopcloning.cpp Outdated

Comment thread src/coreclr/jit/loopcloning.cpp

AndyAyersMS merged commit 5341a84 into dotnet:main May 26, 2026
135 of 139 checks passed

AndyAyersMS mentioned this pull request May 26, 2026

JIT: skip loop inversion for loops that are already bottom-tested #128459

Open

dotnet-maestro Bot mentioned this pull request May 27, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#6855

Open

Conversation

AndyAyersMS commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service Bot commented May 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

AndyAyersMS commented May 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

AndyAyersMS commented May 26, 2026

Uh oh!

Uh oh!

Uh oh!

tannergooding commented May 26, 2026

Uh oh!

jakobbotsch left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AndyAyersMS commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AndyAyersMS commented May 24, 2026 •

edited

Loading

AndyAyersMS commented May 26, 2026 •

edited

Loading