JIT: Use faster mod for uint16 values#128509
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR updates CoreCLR JIT morphing/assertion propagation/lowering to enable a cheaper remainder sequence when both operands are proven to fit in uint16 and the divisor is a non-zero constant, and adds a JIT regression test that exercises relevant % const patterns (including char-based modulo).
Changes:
- Teach morphing to avoid rewriting
% constintoa - (a / b) * bwhen lowering can apply a cheaperuint16FastMod-style sequence (and convertMOD→UMODwhen safe). - Add a new
GTF_UMOD_UINT16_OPERANDShint set by assertion propagation and consumed by lowering to trigger the specialized expansion. - Add a new JIT test under
src/tests/JIT/opt/Divide/Regressions/to cover representative modulo patterns.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/morph.cpp | Skips early MOD-to-SUB/MUL/DIV morphing (and may flip MOD→UMOD) so lowering can apply the cheaper uint16 modulo path. |
| src/coreclr/jit/lower.cpp | Implements the uint16-specialized FastMod lowering for GT_UMOD by constant divisors. |
| src/coreclr/jit/assertionprop.cpp | Improves IntegralRange reasoning and sets GTF_UMOD_UINT16_OPERANDS when uint16-range operands are proven. |
| src/coreclr/jit/gentree.h | Introduces the new GTF_UMOD_UINT16_OPERANDS flag. |
| src/coreclr/jit/gentree.cpp | Ensures tree comparison accounts for the new mod/div-related flag. |
| src/tests/JIT/opt/Divide/Regressions/Regression4_Divide.csproj | Adds the new regression test project. |
| src/tests/JIT/opt/Divide/Regressions/Regression4_Divide.cs | Adds the new regression test cases exercising modulo scenarios. |
|
/azp run Fuzzlyn |
|
Azure Pipelines successfully started running 1 pipeline(s). |
The VN-based small-type refinement in IntegralRange::ForNode for GT_LCL_VAR claimed a tight range whenever the local's conservative VN was a CAST to a small type. That is unsound for normalize-on-load locals: their storage only contains the small-type bits in the low byte (upper bytes are stale), but the tight range let fgOptimizeCast drop a required sign/zero extending load downstream, causing it to read those stale bits. Restrict the refinement to locals whose storage is fully normalized: either non-small locals, or small locals with lvNormalizeOnStore set. Repros fuzzlyn seed 12902382323863156506 (and others) where '(short)(arg0 ^ arg2)' with sbyte arg0 began returning the unsigned byte interpretation in release. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run Fuzzlyn |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| // extending load and read those stale bits. | ||
| if ((compiler->vnStore != nullptr) && (!varTypeIsSmall(varDsc->TypeGet()) || varDsc->lvNormalizeOnStore())) | ||
| { | ||
| ValueNum vn = compiler->vnStore->VNConservativeNormalValue(node->gtVNPair); |
There was a problem hiding this comment.
I do not think IntegralRange::ForNode should depend on VNs. VNs are a flow sensitive concept that is only valid during specific phases. I think this is too general a utility to start using VNs.
I would be ok with introducing specific helpers that can refine integral ranges based on VNs though, and then using it explicitly from phases where VNs are known to be valid. But in the end you would probably want to use range check instead.
| if (type == TYP_LONG) | ||
| { | ||
| // The shift result is already TYP_LONG; turn divMod itself into the shift. | ||
| divMod->ChangeOper(GT_RSZ); | ||
| divMod->gtOp1 = mul2; | ||
| divMod->gtOp2 = shiftAmount; | ||
| } | ||
| else | ||
| { | ||
| assert(type == TYP_INT); | ||
| GenTree* shift = m_compiler->gtNewOperNode(GT_RSZ, TYP_LONG, mul2, shiftAmount); | ||
| BlockRange().InsertBefore(divMod, shift); | ||
|
|
||
| divMod->ChangeOper(GT_CAST); | ||
| divMod->AsCast()->gtCastType = TYP_INT; | ||
| divMod->gtOp1 = shift; | ||
| divMod->gtOp2 = nullptr; | ||
| divMod->SetUnsigned(); | ||
| } |
| #if defined(TARGET_ARM64) | ||
| assert(!divMod->OperIs(GT_UMOD)); | ||
| // ARM64 has no remainder instruction. Morph rewrites every non-pow2 MOD/UMOD | ||
| // into a SUB-MUL-DIV form except when the FastMod path here can apply, which |
There was a problem hiding this comment.
I've been thinking about this particular problem a bit recently.
That is, the general problem where we have some operation X which has to be morphed into a different pattern for VN/CSE or other purposes, particularly when part of the expression is complex and/or expensive.
I wonder if it would be beneficial in such cases to either have a flag on the "root" of the expression - i.e. a flag on the SUB in SUB(x, MUL(DIV(x, y), y)) indicating say GTF_MOD_ROOT or even a custom node that exists only in HIR (you could identify it as a MOD with only 1 operand, for example).
This would allow us to still do the morph but also allow any other code to trivially see "hey, this is actually a MOD expression and so you can extract the x/y and do what you need optimization wise with it"
The same would also apply to some other cases where we have data we want to CSE, like say -float where it has to emit as XOR(x, -0.0) on xarch or Abs(float) where it is ANDN(x, -0.0)
|
/azp run Fuzzlyn |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Another attempt at #111535
Closes #111492
Change the transformation for
Let's see what CI thinks.
Diffs: https://gist.github.com/MihuBot/dc97c694a112d2010a2e00fb2501f33b