Use mul+add+permute sequence for DotProduct when AVX is available#125666
Open
alexcovington wants to merge 10 commits intodotnet:mainfrom
Open
Use mul+add+permute sequence for DotProduct when AVX is available#125666alexcovington wants to merge 10 commits intodotnet:mainfrom
alexcovington wants to merge 10 commits intodotnet:mainfrom
Conversation
Contributor
|
Tagging subscribers to this area: @dotnet/area-system-memory |
tannergooding
approved these changes
Mar 23, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates x86/x64 JIT lowering for SIMD DotProduct to prefer a MUL + permute + add reduction sequence when AVX is available, avoiding vdpps/vdppd in those cases to improve performance for common Vector* dot-product patterns.
Changes:
- Replace AVX float/double dot-product lowering with explicit multiply + permute + add reduction sequences in
LowerHWIntrinsicDot. - Add AVX-gated alternative lowering paths for some Vector128/Vector128 dot-product cases.
Member
|
It looks like Copilot left a useful feedback to address |
added 3 commits
March 24, 2026 09:23
…olidate code duplication into a single path for Vector128/256
tannergooding
approved these changes
Mar 25, 2026
EgorBo
reviewed
Mar 25, 2026
EgorBo
reviewed
Mar 25, 2026
EgorBo
reviewed
Mar 25, 2026
EgorBo
reviewed
Mar 25, 2026
EgorBo
reviewed
Mar 25, 2026
Don't remove node if we can't find user Co-authored-by: Egor Bogatov <egorbo@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On x86 when AVX is available, it is generally more performant to calculate dot products using a multiply+permute+addition sequence instead of
vdpps/vdppd.This PR modifies lowering to use the multiply+permute+addition sequence if AVX is available.
Disasm
System.Runtime.Intrinsics.Tests.Perf_Vector128Of
Base
Diff
System.Runtime.Intrinsics.Tests.Perf_Vector128Of
Base
Diff
System.Runtime.Intrinsics.Tests.Perf_VectorOf
Base
Diff
System.Runtime.Intrinsics.Tests.Perf_VectorOf
Base
Diff