Conversation
This naming makes more sense, esspecially if we add more benchmark files. Signed-off-by: Joe Richey <joerichey@google.com>
|
This PR came about because I discovered the amazing After installing the tool, we can run buffer::p384::bench_getrandom::inner:
push rbx
sub rsp, 64
; Zero the buffer
xorps xmm0, xmm0
movaps xmmword ptr [rsp + 48], xmm0
movaps xmmword ptr [rsp + 32], xmm0
movaps xmmword ptr [rsp + 16], xmm0
; Call the funtion
lea rbx, [rsp + 16]
mov esi, 48
mov rdi, rbx
call qword ptr [rip + getrandom::imp::getrandom_inner@GOTPCREL]
; Check for error
test eax, eax
jne .LBB17_1
; test::black_box(slice);
mov qword ptr [rsp], rbx
mov qword ptr [rsp + 8], 48
mov rax, rsp
add rsp, 64
pop rbx
retWe can see the effect of using buffer::p384::bench_getrandom_uninit::inner:
push rbx
sub rsp, 64
lea rbx, [rsp + 16]
mov esi, 48
mov rdi, rbx
call qword ptr [rip + getrandom::imp::getrandom_inner@GOTPCREL]
test eax, eax
jne .LBB18_1
mov qword ptr [rsp], rbx
mov qword ptr [rsp + 8], 48
mov rax, rsp
add rsp, 64
pop rbx
retAs the benchmarks are compiled as separate crates, we can see the effect of inlining. Removing the buffer::p384::bench_getrandom_uninit::inner:
push rbx
sub rsp, 80
lea rbx, [rsp + 16]
lea rsi, [rsp + 32]
mov edx, 48
mov rdi, rbx
call qword ptr [rip + getrandom::getrandom_uninit@GOTPCREL]
mov rax, qword ptr [rsp + 16]
test rax, rax
je .LBB18_1
mov rcx, qword ptr [rsp + 24]
mov qword ptr [rsp + 16], rax
mov qword ptr [rsp + 24], rcx
add rsp, 80
pop rbx
retWe can also see that passing the entire array to buffer::p384::bench_getrandom_uninit::inner:
sub rsp, 104
lea rdi, [rsp + 56]
mov esi, 48
call qword ptr [rip + getrandom::imp::getrandom_inner@GOTPCREL]
test eax, eax
jne .LBB18_1
; 48 byte copy
movups xmm0, xmmword ptr [rsp + 56]
movups xmm1, xmmword ptr [rsp + 72]
movups xmm2, xmmword ptr [rsp + 88]
movaps xmmword ptr [rsp + 32], xmm2
movaps xmmword ptr [rsp + 16], xmm1
movaps xmmword ptr [rsp], xmm0
mov rax, rsp
add rsp, 104
ret@briansmith this relates to #291 (comment) about how the type you pass to |
This change: - Move the benchmarks from mod.rs to buffer.rs - Move the inner loop we benchmark into an `#[inline(never)]` function - Includes instructions for getting the ASM for a specific benchmark This should hopefully reduce the variance of these benchmarks and make it easier to figure out if we are emitting the assembly or IR we expect for a particular implementation. Signed-off-by: Joe Richey <joerichey@google.com>
|
No major objections from me.
I think most users don't really want a |
* Rename benches/mod.rs to benches/buffer.rs This naming makes more sense, especially if we add more benchmark files. Signed-off-by: Joe Richey <joerichey@google.com> * Rework benchmarks to make it easier to get assembly. This change: - Move the benchmarks from mod.rs to buffer.rs - Move the inner loop we benchmark into an `#[inline(never)]` function - Includes instructions for getting the ASM for a specific benchmark This should hopefully reduce the variance of these benchmarks and make it easier to figure out if we are emitting the assembly or IR we expect for a particular implementation. Signed-off-by: Joe Richey <joerichey@google.com> Signed-off-by: Joe Richey <joerichey@google.com>
This change:
&[u8]totest::black_boxfor both benchmarks#[inline(never)]functionThis should hopefully reduce the variance of these benchmarks and make it easier to figure out if we are emitting the assembly or IR we expect for a particular implementation.
Signed-off-by: Joe Richey joerichey@google.com