[LoongArch64] add Intrinsics' API for LoongArch64.#94400
Conversation
|
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics Issue DetailsWe have finished the SIMD on the runtime6.0 and the tests passed. I will push the SIMD for LoongArch64. This is the first PR about the API's name. @tannergooding
|
|
@tannergooding This is just the API's name, and first focus on the class name and the API name. Later I will update this PR to amend some details. Thanks |
|
As a new architecture, it's more risky to expose public APIs comparing to mature architectures. I'd suggest keeping them internal, and focusing on cross-platform Vector128/256 intrinsics now. |
|
For API names, you can open API proposal like #94011. API definition without JIT implementation should be unwanted. |
If the API is OK for LoongArch64, I will push the JIT implementation. |
yes, the Vector128/256 is independent of the CPU. Now the API for architecture is the most important for LoongArch64, I want to confirm them for LoongArch64. |
| #pragma warning disable IDE0060 // unused parameters | ||
| using System.Runtime.CompilerServices; | ||
|
|
||
| namespace System.Runtime.Intrinsics.LoongArch64 |
There was a problem hiding this comment.
I've marked this as NO-MERGE since we cannot take it until after an API review has occurred. See https://github.com/dotnet/runtime/blob/main/docs/project/api-review-process.md
We need an API proposal, following the standard template, created first. We'll have the discussion on relevant name changes and other bits there, then I can then champion that and take it to API review. Once approved, we can then implement the API surface.
Until then, LoongArch would be relegated to only supporting the existing cross platform API surface. For example, Leading/TrailingZeroCount can be supported by accelerating int.Leading/TrailingZeroCount and the same methods on the other primitive types.
There was a problem hiding this comment.
Thanks !
Reviewing the API for LoongArch64 based on a PR maybe more clear. So I pushed this PR.
I will create an API proposal for LoongArch64's API.
| /// float64x4_t xvfmin_d_f64 (float64x4_t a, float64x4_t b) | ||
| /// LASX: XVFMIN.D Xd.4D, Xj.4D, Xk.4D | ||
| /// </summary> | ||
| public static Vector256<double> Min(Vector256<double> left, Vector256<double> right) => Min(left, right); |
There was a problem hiding this comment.
What's the semantics around NaN and -0 handling on LoongArch?
There was a problem hiding this comment.
The float operation is implemented within the IEEE-754-2008, here is MinNum(x,y).
| /// float32x8_t xvfrecip_s_f32 (float32x8_t a) | ||
| /// LASX: XVFRECIP.S Xd.8S Xj.8S | ||
| /// </summary> | ||
| public static Vector256<float> Reciprocal(Vector256<float> value) => Reciprocal(value); |
There was a problem hiding this comment.
Is this exact, or is it an estimate with more than 0.5 ULP error allowed, like on several other platforms?
There was a problem hiding this comment.
The Reciprocal is implemented with the IEEE754-2008 division(1.0,x).
There was a problem hiding this comment.
Only the FRECIPE and FRSQRTE within the LoongArchBase class are estimate.
But the FRECIP and FRSQRT are exact.
| /// bool xvsetnez_v_u8 (uint8x32_t value) | ||
| /// LASX: XVSETNEZ.V cd, Xj.32B | ||
| /// </summary> | ||
| public static bool HasElementsNotZero(Vector256<byte> value) => HasElementsNotZero(value); |
There was a problem hiding this comment.
How does this instruction work at the hardware level?
Xj.32B is clearly the input register, but I'm not familiar with cd here. Is it a general purpose register, a flag register, something else?
There was a problem hiding this comment.
I will answer these together later.
There was a problem hiding this comment.
How does this instruction work at the hardware level?
Xj.32Bis clearly the input register, but I'm not familiar withcdhere. Is it a general purpose register, a flag register, something else?
The cd is a float flag register which indicating the floats comparing results.
There are 8 cd float flag registers.
Of course here I didn't expose the cd within the API just for simple usage.
Update the API within the LoongArchBase class.
| /// </summary> | ||
| [Intrinsic] | ||
| [CLSCompliant(false)] | ||
| public abstract class LoongArchBase |
There was a problem hiding this comment.
- Rename this file as
LoongArchBase.cs, is it OK?
Or Just name this file asLABase.cs? - Naming this class as
LoongArchBase, is it OK ?
| public static int LeadingSignCount(int value) => LeadingSignCount(value); | ||
|
|
||
| /// <summary> | ||
| /// LA64: CLO.W rd, rj | ||
| /// </summary> | ||
| public static int LeadingSignCount(uint value) => LeadingSignCount(value); |
There was a problem hiding this comment.
Is it needed to add two types API with int value and uint value ?
| public static long ReverseElementBits(int value) => ReverseElementBits(value); | ||
|
|
||
| /// <summary> | ||
| /// LA64: BITREV.W rd, rj | ||
| /// </summary> | ||
| public static ulong ReverseElementBits(uint value) => ReverseElementBits(value); |
There was a problem hiding this comment.
Is it needed to add the int value and uint value for the API ReverseElementBits() ?
| public static int ReverseElementBits(int value) => ReverseElementBits(value); | ||
|
|
||
| /// <summary> | ||
| /// LA64: REVB.2W rd, rj | ||
| /// </summary> | ||
| public static uint ReverseElementBits(uint value) => ReverseElementBits(value); | ||
|
|
||
| /// <summary> | ||
| /// LA64: REVB.D rd, rj | ||
| /// </summary> | ||
| public static long ReverseElementBits(long value) => ReverseElementBits(value); | ||
|
|
||
| /// <summary> | ||
| /// LA64: REVB.D rd, rj | ||
| /// </summary> | ||
| public static ulong ReverseElementBits(ulong value) => ReverseElementBits(value); |
There was a problem hiding this comment.
These are part of instructions liking the Arm64's REV, REV16, REV32, REV64, but the ArmBase class doesn't support these, Why?
Is it needed to add these for LoongArch64.
19f78ff to
6e3a9e7
Compare
Sign-Zero-extend and MultiplyWiden
2a527ef to
1e7203a
Compare
bitwise shift, shuffle, compare and float operations.
add LoadElementReplicateVector, Vector elements' operations and AverageRounded.
008722b to
8739f1b
Compare
|
Hi, @tannergooding |
424a8a1 to
411b9f5
Compare
411b9f5 to
6c7b380
Compare
I can potentially give it a pass today or tomorrow, but its still blocked until API review can happen. That probably won't happen until the new year as API review typically doesn't happen in December when most people are on holiday/vacation. |
OK, Thanks I will push other PRs that are independent of these APIs liking the SIMD's instructions within the emitter #95456 |
Also amend some code-formate.
b260975 to
952a76b
Compare
|
I'm still waiting for response to the question asked on the API proposal:
|
I'm very sorry for late response. Although the GCC had merged the LoongArch's SIMD. And the LLVM is same. There is an unofficial intrinsics manual: |
|
Thanks! This is still on my backlog but is lower priority than some other work due to the API review not having happened yet (and this PR being blocked until that can happen). I'll try to set some time aside in the next week or two to go through the SIMD ISA guide and compare it to the proposed API surface so that it can get marked |
OK, Thanks very much. |
|
@tannergooding what is the status of this review? |
|
As per the above, new public API surface cannot be added without it first going through API Review. Last checked, the LoongArch specs weren't available or were still in draft form, so going to API review wasn't possible. If that's changed, then it'd be great to have links and references to the official docs so I can help drive this through to completion. |
|
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
We have finished the SIMD on the runtime6.0 and the tests passed.
I will push the SIMD for LoongArch64.
This is the first PR about the API's name.
The [API Proposal]: LoongArch64: add Intrinsics' API for LoongArch64
#94445
@tannergooding
Can you give me some advices ?
Thanks