intrinsic-test: sve support #2160
Conversation
c29cf85 to
6561db8
Compare
9ff4174 to
e817965
Compare
This comment was marked as resolved.
This comment was marked as resolved.
95f286a to
ed132a7
Compare
| } = **self | ||
| { | ||
| let quad = if self.num_lanes() * bl > 64 { "q" } else { "" }; | ||
| fn comparison_function(&self) -> String { |
There was a problem hiding this comment.
Is there some way to print the diff for SVE? I would imagine it is difficult due to SVE being non-const sized, but is there some way? That would massively improve debuggability. One approach I can suggest is have some small functions that convert from SVE vectors to &[T], e.g.
#[target_feature(enable = "sve")]
pub fn svfloat32_to_slice(a: &svfloat32_t) -> &[NanEqF32] {
unsafe {
core::slice::from_raw_parts(core::ptr::from_ref(a).cast(), svcntw() as usize)
}
}this might work, with significantly less complexity
There was a problem hiding this comment.
When I've needed to debug them, I've disabled the invocation of intrinsic-test in intrinsic-test.sh and added these snippets to the generated tests that I wanted to debug.
For non-bool vectors, just replace the x0_val and x1_val with the vectors that you want to inspect, e.g. __rust_return_value or __c_return_value:
{
let num_elems = svcnth() as usize;
let mut x0_buf = Vec::with_capacity(num_elems);
let mut x1_buf = Vec::with_capacity(num_elems);
svst1_u16(__pred, x0_buf.as_mut_ptr(), x0_val);
x0_buf.set_len(num_elems);
svst1_u16(__pred, x1_buf.as_mut_ptr(), x1_val);
x1_buf.set_len(num_elems);
for i in 0..num_elems {
let x0_val = x0_buf[i];
dbg!(i, x0_val);
}
for i in 0..num_elems {
let x1_val = x1_buf[i];
dbg!(i, x1_val);
}
}Similarly for bool vectors:
{
let num_elems = svcntb() as usize;
let mut _op_val = Vec::with_capacity(num_elems);
svst1_u8(op_val, _op_val.as_mut_ptr(), svdup_n_u8(1));
_op_val.set_len(num_elems);
for i in 0..num_elems {
let _op_val_el = if _op_val[i] == 1 { true } else { false };
dbg!(i, _op_val_el);
}
}I'm not sure if what you've suggested will work, happy to try. I'm more than happy to make some improvements here for debuggability of the tests when they do fail, but could that be left to a follow-up?
| " | ||
| {intrinsics} | ||
| "#, | ||
| intrinsics = intrinsics.iter().format_with("", |intrinsic, fmt| { |
There was a problem hiding this comment.
very minor nit, but you can use something like intrinsic.specializations().for_each(|imm_values| fmt(...))
|
also, it would be nice if this can be split into a few focused PRs (I can help if you want) |
|
Submitted rust-lang/rust#158088 to fix the failure in debug mode - should pass CI after that |
I'll try and split some out tomorrow |
…o-no-opt, r=lqd codegen_ssa: no dbginfo for scalable vec local w/ `-O0` LLVM uses GlobalISel with `-O0` that doesn't support scalable vectors. It normally falls back to SDAG which does support scalable vectors, but there's a bug that means that isn't happening for debuginfo - so temporarily don't emit debuginfo for scalable vector locals when there are no optimisations until that bug is fixed. cc llvm/llvm-project#204585 cc rust-lang/stdarch#2160 r? @lqd
Rollup merge of #158088 - davidtwco:scalable-vector-debuginfo-no-opt, r=lqd codegen_ssa: no dbginfo for scalable vec local w/ `-O0` LLVM uses GlobalISel with `-O0` that doesn't support scalable vectors. It normally falls back to SDAG which does support scalable vectors, but there's a bug that means that isn't happening for debuginfo - so temporarily don't emit debuginfo for scalable vector locals when there are no optimisations until that bug is fixed. cc llvm/llvm-project#204585 cc rust-lang/stdarch#2160 r? @lqd
… r=lqd codegen_ssa: no dbginfo for scalable vec local w/ `-O0` LLVM uses GlobalISel with `-O0` that doesn't support scalable vectors. It normally falls back to SDAG which does support scalable vectors, but there's a bug that means that isn't happening for debuginfo - so temporarily don't emit debuginfo for scalable vector locals when there are no optimisations until that bug is fixed. cc llvm/llvm-project#204585 cc #2160 r? @lqd
This macro isn't necessary and just makes the generated code being written harder to read compared to multi-line strings.
Replacing `iter_specializations` (which repeatedly invokes a callback) with an iterator implementation allows `Itertools::format_with` to be used more broadly, which in turn allows disparate string interpolation to be combined and hopefully provide greater context to the reader.
This isn't strictly necessary but these type names were longer than they needed to be.
Updates `get_load_function` to return `svld{n}_{ty}` when loading a
scalable vector type. Caller of `get_load_function` will still need
updated to handle passing the predicate arguments to these load
functions.
Various SVE intrinsics are not yet implemented in stdarch, but are present in the `arm_intrinsics.json` and so should be skipped.
Updates the headers used by generated C code and the target feature flags passed to the C compiler to enable SVE.
A small refactoring to make the type printing logic slightly cleaner and with greater code re-use.
Predicate arguments of type `svbool_t` do not need test value arrays to be generated as the same enable-all-lanes predicate will be passed to all invocations of the intrinsic under test. There is no `svld1` equivalent for `svbool_t` that could be used even if there were test values to use.
Introduces a per-architecture abstraction over how intrinsic results are compared, so that later commits can implement Arm-specific comparison logic for SVE.
Refactoring enabling accessing architecture-specific behaviour that isn't associated with either of the return or argument types.
Support defining a local variable containing the predicate that will be used with all subsequent scalable vector intrinsics.
Implementation of `get_comparison_function` and `get_predicate_function` for SVE which uses the relevant SVE intrinsics.
Refactoring enabling accessing architecture-specific behaviour that isn't associated with the specific argument type.
Instead of assuming that any scalable boolean argument is a predicate, handle predicates specifically and generate test values for `svbool_t` values.
All of the generated output is run through rustfmt so these aren't necessary.
Enables generation of tests for SVE intrinsics leveraging the changes from the previous commits.
There doesn't need to be so many or other modules with the values.
SVE isn't a baseline target feature for `aarch64-unknown-linux-gnu` but should be enabled when running tests.
SVE intrinsics aren't available on big endian
Like with non-SVE test generation, comparison of float results in scalable vectors need special-handling of comparisons.
The output of these cannot be compared.
`sveorv` intrinsics trigger a miscompile in LLVM where the call to the Rust intrinsic is optimised out and replaced with a zero, which is incorrect.
These tests require that we generate test arrays with values that are valid when cast to a pointer, which we don't currently support.
GCC quickly ICEs when asked to compile intrinsic-test's wrapper sources.
|
Split out the first batch of changes in #2169 |
ed132a7 to
ea73af3
Compare
|
Split out the second batch of changes in #2170 |
|
Split out the third batch of changes in #2171 |
|
Split out fourth batch of changes in #2172 |
|
Split out final batch of changes in #2173. Shouldn't be much less other than the actual SVE testing implementation here after those are merged, but there will be a few merge conflicts between them as they land. |
This needs rust-lang/rust#158088 to be in nightly for CI to pass - it is already approved and in the bors queue. Parts of this have been split out in #2169, #2170, #2171, #2172, #2172 and #2173.
This patch contains all the changes necessary for
intrinsic-testto generate tests for the SVE intrinsics:populate_randomand related code #2126, intrinsic-test: removearm::argumentmodule #2127)Some of the commits are small changes I made along the way (e.g. forwarding args fromintrinsic-test.shtocargo test)I could split these out if we wantedintrinsic-test.sh#2164Some of the commits modify the definitions of intrinsics so that they pass tests - this only happened for a handful of intrinsics and the changes required to their definitions were very minorI could split these out if we wantedsvrev,svzipandsvuzp#2163Some of the commits updatearm_intrinsics.jsonto add some additional constraints necessary to generate the right tests (we're making sure that these addl. constraints are reflected in the source of truth we use to generate that file internally)I could split these out if we wantedarm_intrinsics.jsonforsvsetandsvget#2162arm_sve.hheaders and appropriate target feature flags for C and Rustsvld1intrinsic when loading an argument of scalable vector typesvbool_targuments which do not have ansvld1intrinsic, we load asvint8_tand usesvcmpne_n_s8against zero to produce asvbool_tfrom itsvbool_tvalues (though the earlier commits use the samesvptrue_$typredicate for these arguments initially)svpattern), using simple const functions that map integers from intrinsic-test constraints to the appropriate variantsvptrue_$ty()to enable every lane)svcmpeq_$tyandsvptest_anyto compare the Rust and C results instead ofassert_eq!svcmpeq_$tyandsvptest_anyblocks are generated for each vector in the tuple, withsvgetNcalls to extract the appropriate vector from the result tuplessvbool_t, where there is no equivalentsvcmpeq_$tycall, sosveor_b_z(exclusive OR) and!svptest_anyis usedNanEqF*type cannot be used, so a(rust == c) || (isnan(rust) && isnan(c))check is performed instead, via a handful of SVE intrinsics