Fix {to,from}_array UB when repr(simd) produces padding#342
Conversation
| // it results in better codegen with optimizations disabled, but we should | ||
| // probably just use `transmute` once that works on const generic types. | ||
| // FIXME: We currently use a pointer store instead of `transmute_copy` because `repr(simd)` | ||
| // results in padding for non-power-of-2 vectors (so vectors are larger than arrays). |
There was a problem hiding this comment.
If vectors are larger than arrays and more aligned (or the same size and same align), then I think the read is always fine for this?
Because if not, we'll have to remove https://doc.rust-lang.org/std/simd/struct.Simd.html#method.as_mut_array.
(Which, comes to think of it, means that we can always implement to_array as *self.as_array(), since we currently require T: Copy.)
There was a problem hiding this comment.
Good point, I changed it to *self.as_array()
scottmcm
left a comment
There was a problem hiding this comment.
I'm not a reviewer here, but FWIW this looks good to me.
| /// | ||
| /// # Safety | ||
| /// Writing to `ptr` must be safe, as if by `<*mut [T; N]>::write_unaligned`. | ||
| const unsafe fn store(self, ptr: *mut [T; N]) { |
There was a problem hiding this comment.
I think you'll want to explicitly copy self to a temporary first, and then memcpy from the temporary, since afaict llvm optimizes that to a vector load/store due to the vector insns generated by the temporary, whereas just calling memcpy generates only a memcpy so llvm doesn't see any vector operations and generates possibly less-efficient code.
compare store vs. store2 -- store2 has the temporary.
https://godbolt.org/z/jWsc6ezP4
There was a problem hiding this comment.
Isn't it intentional, though, that it's not a vector read? Because I read the OP as saying that that would generate an over-long read.
There was a problem hiding this comment.
no, because LLVM IR defines vector load/store instructions to never read/write the padding (when align is small enough), as i explained on Zulip (see #341). read_unaligned has no such guarantee
There was a problem hiding this comment.
so we intentionally want a llvm load/store instruction with align <elem-align> not full-simd-type-align
There was a problem hiding this comment.
It's not written as a vector store, because that would be wrong (writing past the end of the array), but it should lower as one. I added the temporary with a comment as to why it's necessary.
There was a problem hiding this comment.
Oh, I see, I was mixing up the LLVM type and the Rust type.
| self.store(tmp.as_mut_ptr()); | ||
| tmp.assume_init() | ||
| } | ||
| *self.as_array() |
There was a problem hiding this comment.
this doesn't generate a vector store instruction so llvm may produce less optimal code.
cc3e99b to
c504f01
Compare
Addresses #341 (but doesn't introduce an intrinsic).