JIT: generalize cloning conditions slightly#128532
Conversation
optExtractTestIncr previously required the IV increment to be the immediate predecessor statement of the loop test (after an optional BBINSTR profile-counter store). This rejected loops where unrelated statements happened to sit between the increment and the test, even though the increment-and-test pair was otherwise well-formed. Relax the check to scan backward through the exit block's statements for the first IV-shaped update (the shape recognized by optIsLoopIncrTree). Picking the wrong candidate is safe: the caller FlowGraphNaturalLoop::AnalyzeIteration verifies via MatchLimit that the test actually uses the picked iterVar, and via VisitDefs that there are no other defs of iterVar anywhere in the loop, so any incorrect pick is rejected and the loop simply fails IV analysis as before. This recovers some loop cloning that was previously blocked by incidental adjacency failures (e.g. an unrelated store sneaking in between the increment and the test in the exit block). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Some loops are rejected by FlowGraphNaturalLoop::CheckLoopConditionBaseCase because their loop test cannot be proved true on the first iteration, and no BBJ_COND zero-trip guard exists on the unique-pred chain in front of the preheader. Previously, unconditional loop inversion happened to supply such a guard as a side effect; now that inversion is being made more selective, fewer loops have it. Teach loop cloning to emit its own runtime zero-trip guard as one of the fast-path entry conditions. AnalyzeIteration gains an allowMissingBaseCase flag (off by default to preserve unroller/inversion semantics); when set, it accepts loops with concrete enough init/limit info and marks NaturalLoopIterInfo::NeedsZeroTripGuard. optDeriveLoopCloningConditions then pushes an additional 'init TestOper limit' condition (handling const, invariant-local, and array-length limits) onto the loop's cloning conditions, so the fast path runs only when the loop would actually iterate at least once. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a loop has no statically-known init but has a recognizable IV with a const, invariant-local, or array-length limit, emit the runtime zero-trip guard using the IterVar local read at the preheader as the init expression. AnalyzeIteration has already verified the IV is not address-exposed and has no extraneous defs inside the loop, so the preheader read yields the value the IV will have on first iteration. This recovers most of the residual clone loss from the inversion-skip: on osx-arm64 vs upstream main, LoopsCloned now improves by +164 (up from +59 before this commit). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR broadens the JIT’s loop-iteration analysis and loop-cloning eligibility by (1) relaxing how the IV increment is located in an exiting block and (2) allowing AnalyzeIteration to succeed even when it can’t prove the loop executes at least once, by deferring that requirement to loop cloning via an explicit runtime “zero-trip” guard condition.
Changes:
- Update
optExtractTestIncrto search backward for an IV-shaped increment instead of requiring it to be immediately adjacent to the loop test. - Extend
FlowGraphNaturalLoop::AnalyzeIterationwith anallowMissingBaseCasemode that setsNeedsZeroTripGuardwhen the base-case can’t be proven. - In loop cloning, when
NeedsZeroTripGuardis set, emit an extra cloning condition guarding the fast path against zero-trip execution.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/coreclr/jit/optimizer.cpp |
Changes IV increment extraction to scan backward from the loop test. |
src/coreclr/jit/loopcloning.cpp |
Adds emission of a zero-trip guard cloning condition and enables missing-base-case iteration analysis for cloning. |
src/coreclr/jit/flowgraph.cpp |
Adds allowMissingBaseCase parameter and sets NeedsZeroTripGuard when base-case can’t be proven but a symbolic guard is expressible. |
src/coreclr/jit/compiler.h |
Extends NaturalLoopIterInfo and updates AnalyzeIteration signature/defaults. |
- optExtractTestIncr: after relaxing the adjacency requirement, validate that no statement strictly between the chosen IV increment and the loop test references the iterator variable. This restores the AnalyzeIteration invariant that no loop-body IR observes the post-increment value of the iterator (except the test itself), which loop cloning and other consumers rely on. - AnalyzeIteration: update the Remarks block to document the relaxed adjacency rule, the new allowMissingBaseCase parameter, and the caller's obligation to emit a runtime zero-trip guard when NeedsZeroTripGuard is set. SPMI (osx-arm64) shows the use-check costs only ~9 of ~95 recovered clones across the major collections; net cloning gain is preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Statement::LocalsTreeList asserts NodeThreading::AllLocals, which is not the threading mode in effect when loop cloning runs (asserts fired in checked builds, e.g. during 'Clone loops' on System.Decimal:FromOACurrency). Replace the iteration over LocalsTreeList with an explicit fgWalkTreePre walk (lclVarsOnly), which works in any threading mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
652c84f to
b4fd5f7
Compare
- optExtractTestIncr: replace the custom fgWalkTreePre + lambda with gtTreeHasLocalRead / gtTreeHasLocalStore. Avoids duplicating walker logic and the per-iteration lambda allocation. - optDeriveLoopCloningConditions: when both NeedsZeroTripGuard and HasArrayLengthLimit are set, hoist the ArrIndex allocation and the array deref entry out so the zero-trip guard and the regular limit conditions share a single ArrIndex/deref rather than allocating and pushing them twice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AnalyzeIteration's VisitDefs already rejects any def of iterVar in the loop other than the picked increment, so a statement between incr and test cannot store to iterVar. Only a read check is needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gtTreeHasLocalRead only looks at direct local references, so it cannot see intervening accesses to an address-exposed iterator through indirection. AnalyzeIteration ultimately rejects address-exposed IVs anyway, so reject here and let the caller try the next candidate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
More cloning (unsurprisingly). Should mitigate most of the cloning losses seen in #128459 @jakobbotsch PTAL |
In response to PR feedback: the backward scan in optExtractTestIncr picked the first IV-shaped statement and bailed entirely if it failed later validation (address-exposed iterVar, test does not read iterVar, or intervening read of iterVar). Restructure the scan so that each candidate is fully validated inline and rejection simply continues the scan, only failing when no candidate in the block qualifies. Also reset NeedsZeroTripGuard at AnalyzeIteration entry so a stale value cannot leak if the iter-info struct is reused across loops. SPMI on osx-arm64: +127 LoopsCloned vs upstream main (aspnet2 +7, libraries.pmi +42, benchmarks.run +28, libraries.crossgen2 +36, realworld.run +14). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Latest diffs |
|
We have a few diffs like this: -G_M56737_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=000C {rdx rbx}, byref, isz
- movzx r8, word ptr [rbx+2*rcx]
+G_M56737_IG06: ; bbWeight=3.96, gcrefRegs=0000 {}, byrefRegs=000C {rdx rbx}, byref, isz
+ mov r8d, ecx
+ movzx r8, word ptr [rbx+2*r8]I don't think its blocking or anything, but it might be interesting to look at specifically since we can presumably still consume Seems to be a few instances and coule reduce the regression seen by the diffs. |
jakobbotsch
left a comment
There was a problem hiding this comment.
LGTM, couple of paranoia questions
In response to PR feedback: * Add a small budget (100) decremented across the outer scan and any inner intervening-read scan, to prevent pathological O(N^2) behavior in test blocks with many IV-shaped statements. * If the test block is in a try region, treat any intervening statement that may throw (GTF_EXCEPT) as if it were an intervening read of the iterator. An EH handler could otherwise observe the post-increment value, violating AnalyzeIteration's invariant. SPMI on osx-arm64 unchanged: still +127 LoopsCloned vs upstream main. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In response to PR feedback. The condition that triggers NeedsZeroTripGuard is that the loop condition [IterVar TestOper Limit] cannot be proven to hold on entry, not that the loop body cannot be proven to execute at least once. A do/while-style loop is guaranteed to execute at least once but may still need a guard if its condition is not provably true on entry. Updated wording in compiler.h, flowgraph.cpp, and loopcloning.cpp to reflect the actual contract. No code changes; comments only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
We are apparently running into cases where IV opts, cloning, and LSRA are not playing together nicely: runtime/src/coreclr/jit/inductionvariableopts.cpp Lines 531 to 538 in 8e9aa2a This came from the initial SCEV work (#97865) and I don't think @jakobbotsch ever opened a follow-up issue. |
Generalize loop cloning a bit: