JIT: Use faster mod for uint16 values by MihaZupan · Pull Request #128509 · dotnet/runtime

MihaZupan · 2026-05-22T23:41:50Z

Another attempt at #111535
Closes #111492

Change the transformation for

int Mod(char c) => c % 42;

-int Mod(char c) => (int)(c - (uint)(((ulong)((uint)c >> 1) * 818089009u) >> 34) * 42);
+int Mod(char c) => (int)(((ulong)(102261127u * c) * 42) >> 32);

    movzx    rax, di
-   mov      ecx, eax
-   shr      ecx, 1
-   imul     rcx, rcx, 0x30C30C31
-   shr      rcx, 34
-   imul     ecx, ecx, 42
-   sub      eax, ecx
+   imul     eax, eax, 0x6186187
+   imul     rax, rax, 42
+   shr      rax, 32
    ret

Let's see what CI thinks.

Diffs: https://gist.github.com/MihuBot/dc97c694a112d2010a2e00fb2501f33b

dotnet-policy-service · 2026-05-22T23:42:59Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

MihaZupan · 2026-05-22T23:43:22Z

@MihuBot

Copilot

Pull request overview

This PR updates CoreCLR JIT morphing/assertion propagation/lowering to enable a cheaper remainder sequence when both operands are proven to fit in uint16 and the divisor is a non-zero constant, and adds a JIT regression test that exercises relevant % const patterns (including char-based modulo).

Changes:

Teach morphing to avoid rewriting % const into a - (a / b) * b when lowering can apply a cheaper uint16 FastMod-style sequence (and convert MOD→UMOD when safe).
Add a new GTF_UMOD_UINT16_OPERANDS hint set by assertion propagation and consumed by lowering to trigger the specialized expansion.
Add a new JIT test under src/tests/JIT/opt/Divide/Regressions/ to cover representative modulo patterns.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/coreclr/jit/morph.cpp	Skips early MOD-to-SUB/MUL/DIV morphing (and may flip MOD→UMOD) so lowering can apply the cheaper uint16 modulo path.
src/coreclr/jit/lower.cpp	Implements the uint16-specialized FastMod lowering for `GT_UMOD` by constant divisors.
src/coreclr/jit/assertionprop.cpp	Improves `IntegralRange` reasoning and sets `GTF_UMOD_UINT16_OPERANDS` when uint16-range operands are proven.
src/coreclr/jit/gentree.h	Introduces the new `GTF_UMOD_UINT16_OPERANDS` flag.
src/coreclr/jit/gentree.cpp	Ensures tree comparison accounts for the new mod/div-related flag.
src/tests/JIT/opt/Divide/Regressions/Regression4_Divide.csproj	Adds the new regression test project.
src/tests/JIT/opt/Divide/Regressions/Regression4_Divide.cs	Adds the new regression test cases exercising modulo scenarios.

MihaZupan · 2026-05-23T17:11:25Z

/azp run Fuzzlyn

azure-pipelines · 2026-05-23T17:11:38Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

The VN-based small-type refinement in IntegralRange::ForNode for GT_LCL_VAR claimed a tight range whenever the local's conservative VN was a CAST to a small type. That is unsound for normalize-on-load locals: their storage only contains the small-type bits in the low byte (upper bytes are stale), but the tight range let fgOptimizeCast drop a required sign/zero extending load downstream, causing it to read those stale bits. Restrict the refinement to locals whose storage is fully normalized: either non-small locals, or small locals with lvNormalizeOnStore set. Repros fuzzlyn seed 12902382323863156506 (and others) where '(short)(arg0 ^ arg2)' with sbyte arg0 began returning the unsigned byte interpretation in release. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

MihaZupan · 2026-05-24T00:41:00Z

/azp run Fuzzlyn

azure-pipelines · 2026-05-24T00:41:14Z

Azure Pipelines successfully started running 1 pipeline(s).

jakobbotsch · 2026-05-24T10:31:50Z

+            // extending load and read those stale bits.
+            if ((compiler->vnStore != nullptr) && (!varTypeIsSmall(varDsc->TypeGet()) || varDsc->lvNormalizeOnStore()))
+            {
+                ValueNum  vn = compiler->vnStore->VNConservativeNormalValue(node->gtVNPair);


I do not think IntegralRange::ForNode should depend on VNs. VNs are a flow sensitive concept that is only valid during specific phases. I think this is too general a utility to start using VNs.

I would be ok with introducing specific helpers that can refine integral ranges based on VNs though, and then using it explicitly from phases where VNs are known to be valid. But in the end you would probably want to use range check instead.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

+            if (type == TYP_LONG)
+            {
+                // The shift result is already TYP_LONG; turn divMod itself into the shift.
+                divMod->ChangeOper(GT_RSZ);
+                divMod->gtOp1 = mul2;
+                divMod->gtOp2 = shiftAmount;
+            }
+            else
+            {
+                assert(type == TYP_INT);
+                GenTree* shift = m_compiler->gtNewOperNode(GT_RSZ, TYP_LONG, mul2, shiftAmount);
+                BlockRange().InsertBefore(divMod, shift);
+
+                divMod->ChangeOper(GT_CAST);
+                divMod->AsCast()->gtCastType = TYP_INT;
+                divMod->gtOp1                = shift;
+                divMod->gtOp2                = nullptr;
+                divMod->SetUnsigned();
+            }


Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

tannergooding · 2026-05-26T14:07:31Z

 #if defined(TARGET_ARM64)
-    assert(!divMod->OperIs(GT_UMOD));
+    // ARM64 has no remainder instruction. Morph rewrites every non-pow2 MOD/UMOD
+    // into a SUB-MUL-DIV form except when the FastMod path here can apply, which


I've been thinking about this particular problem a bit recently.

That is, the general problem where we have some operation X which has to be morphed into a different pattern for VN/CSE or other purposes, particularly when part of the expression is complex and/or expensive.

I wonder if it would be beneficial in such cases to either have a flag on the "root" of the expression - i.e. a flag on the SUB in SUB(x, MUL(DIV(x, y), y)) indicating say GTF_MOD_ROOT or even a custom node that exists only in HIR (you could identify it as a MOD with only 1 operand, for example).

This would allow us to still do the morph but also allow any other code to trivially see "hey, this is actually a MOD expression and so you can extract the x/y and do what you need optimization wise with it"

The same would also apply to some other cases where we have data we want to CSE, like say -float where it has to emit as XOR(x, -0.0) on xarch or Abs(float) where it is ANDN(x, -0.0)

MihaZupan · 2026-05-26T20:53:39Z

/azp run Fuzzlyn

azure-pipelines · 2026-05-26T20:53:51Z

Azure Pipelines successfully started running 1 pipeline(s).

JIT: Use faster mod for uint16 values

cea3809

MihaZupan added this to the 11.0.0 milestone May 22, 2026

MihaZupan self-assigned this May 22, 2026

Copilot AI review requested due to automatic review settings May 22, 2026 23:41

MihaZupan added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 22, 2026

Copilot started reviewing on behalf of MihaZupan May 22, 2026 23:42 View session

MihuBot mentioned this pull request May 22, 2026

[JitDiff X64] [MihaZupan] JIT: Use faster mod for uint16 values MihuBot/runtime-utils#1924

Open

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread src/coreclr/jit/lower.cpp Outdated

Comment thread src/coreclr/jit/morph.cpp Outdated

MihuBot mentioned this pull request May 23, 2026

[JitDiff ARM64] [MihaZupan] JIT: Use faster mod for uint16 values MihuBot/runtime-utils#1925

Open

build-analysis Bot mentioned this pull request May 23, 2026

Multiple Helix work items fail on maccatalyst/tvos CoreCLR Release #126460

Open

Add a MinOpts check

56f1398

Copilot AI review requested due to automatic review settings May 23, 2026 22:04

Copilot started reviewing on behalf of MihaZupan May 23, 2026 22:05 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Comment thread src/coreclr/jit/lower.cpp Outdated

MihaZupan force-pushed the uint16-mod branch from 65babc5 to 897296a Compare May 23, 2026 22:25

MihuBot mentioned this pull request May 23, 2026

[JitDiff X64] [MihaZupan] JIT: Use faster mod for uint16 values MihuBot/runtime-utils#1933

Open

jakobbotsch reviewed May 24, 2026

View reviewed changes

Move detection to range checks

a222b5c

Copilot AI review requested due to automatic review settings May 25, 2026 14:02

Copilot started reviewing on behalf of MihaZupan May 25, 2026 14:02 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

MihuBot mentioned this pull request May 25, 2026

[JitDiff X64] [MihaZupan] JIT: Use faster mod for uint16 values MihuBot/runtime-utils#1936

Open

Reduce the number of cases that are deferred to range check

ee631cf

MihuBot mentioned this pull request May 26, 2026

[JitDiff X64] [MihaZupan] JIT: Use faster mod for uint16 values MihuBot/runtime-utils#1940

Open

Add back the type check

e57ffd1

Copilot AI review requested due to automatic review settings May 26, 2026 13:50

Copilot started reviewing on behalf of MihaZupan May 26, 2026 13:51 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

tannergooding reviewed May 26, 2026

View reviewed changes

This was referenced May 26, 2026

IndexOutOfRangeException in ILCompiler.LazyGenericsSupport.GraphBuilder.WalkMethod #122845

Open

XHarness package install failure on iOS due to devicectl NSPOSIXErrorDomain error 49 #123796

Open

Skip some of the new logic under minopts

d7d304f

MihuBot mentioned this pull request May 26, 2026

[JitDiff X64] [MihaZupan] JIT: Use faster mod for uint16 values MihuBot/runtime-utils#1942

Open

MihaZupan added the NO-REVIEW Experimental/testing PR, do NOT review it label May 26, 2026

This was referenced May 27, 2026

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

Conversation

MihaZupan commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service Bot commented May 22, 2026

Uh oh!

MihaZupan commented May 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

MihaZupan commented May 23, 2026

Uh oh!

azure-pipelines Bot commented May 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

MihaZupan commented May 24, 2026

Uh oh!

azure-pipelines Bot commented May 24, 2026

Uh oh!

jakobbotsch May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

tannergooding May 26, 2026

Choose a reason for hiding this comment

Uh oh!

MihaZupan commented May 26, 2026

Uh oh!

azure-pipelines Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MihaZupan commented May 22, 2026 •

edited

Loading

jakobbotsch May 24, 2026 •

edited

Loading