Refactor avx512f#1597
Conversation
82fb013 to
51da0ac
Compare
|
@sayantn Do you mind if I open this? There is a bit of an overlap between this PR and yours, but some things in this PR is not present in your work. |
|
So the only overlap is fma intrinsics and masked loads? I have no problem with you implementing the fma (honestly I didn't know about the |
1e0a0e0 to
c8fc6f2
Compare
|
@sayantn I've removed the masked load changes on my end. Our PRs should now be orthogonal to each other. |
|
Yes I will modify my PR in a while. I will also implement the missing reduce-max etc intrinsics and fix the _mm_cvtt intrinsics (they currently generate vcvt instructions, not cvtt) |
|
can you also please do the floating-point abs using |
|
@sayantn done. |
|
Thanks. |
dd20b4f to
9aae346
Compare
0007890 to
507cef8
Compare
|
I have already done the remaining gather-scatter in avx512f. Can you complete avx512bw - the reduce intrinsics and some mask operations? Then I will start on the remaining IFMA and BF16, then start implementing the new VEX variants |
_MM_FROUND_CUR_DIRECTION.