I noticed that we lose some perf in various SpanHelpers on arm64 due to missing addressing modes which brake pipelining, minimal repro:
Vector128<byte> Add(ref byte b1, ref byte b2, nuint offset) =>
Vector128.LoadUnsafe(ref b1, offset) +
Vector128.LoadUnsafe(ref b2, offset);
Current codegen:
add x0, x1, x3
ld1 {v16.16b}, [x0]
add x0, x2, x3
ld1 {v17.16b}, [x0]
add v16.16b, v16.16b, v17.16b
mov v0.16b, v16.16b
Expected codegen:
ldr q16, [x1, x3]
ldr q17, [x2, x3]
add v16.16b, v16.16b, v17.16b
mov v0.16b, v16.16b
same for [addr + imm] e.g. Vector128.LoadUnsafe(ref b2, 16)
cc @tannergooding
I noticed that we lose some perf in various
SpanHelperson arm64 due to missing addressing modes which brake pipelining, minimal repro:Current codegen:
Expected codegen:
same for
[addr + imm]e.g.Vector128.LoadUnsafe(ref b2, 16)cc @tannergooding