Rework benchmarks to make it easier to get assembly. by josephlr · Pull Request #297 · rust-random/getrandom

josephlr · 2022-10-21T07:21:04Z

This change:

Moves the benchmarks from mod.rs to buffer.rs
Passes a &[u8] to test::black_box for both benchmarks
Move the inner loop we benchmark into an #[inline(never)] function
Includes instructions for getting the ASM for a specific benchmark

This should hopefully reduce the variance of these benchmarks and make it easier to figure out if we are emitting the assembly or IR we expect for a particular implementation.

Signed-off-by: Joe Richey joerichey@google.com

This naming makes more sense, esspecially if we add more benchmark files. Signed-off-by: Joe Richey <joerichey@google.com>

josephlr · 2022-10-21T07:43:08Z

This PR came about because I discovered the amazing cargo-show-asm tool, and I wanted to figure out a way to make it easy to use this tool with our benchmarks.

After installing the tool, we can run cargo asm --bench buffer --release buffer::p384::bench_getrandom::inner. Then after some minor cleanups, we get:

buffer::p384::bench_getrandom::inner:
    push rbx
    sub rsp, 64

    ; Zero the buffer
    xorps xmm0, xmm0
    movaps xmmword ptr [rsp + 48], xmm0
    movaps xmmword ptr [rsp + 32], xmm0
    movaps xmmword ptr [rsp + 16], xmm0
    ; Call the funtion
    lea rbx, [rsp + 16]
    mov esi, 48
    mov rdi, rbx
    call qword ptr [rip + getrandom::imp::getrandom_inner@GOTPCREL]
    ; Check for error
    test eax, eax
    jne .LBB17_1
    ; test::black_box(slice);
    mov qword ptr [rsp], rbx
    mov qword ptr [rsp + 8], 48
    mov rax, rsp

    add rsp, 64
    pop rbx
    ret

We can see the effect of using getrandom_uninit:

buffer::p384::bench_getrandom_uninit::inner:
    push rbx
    sub rsp, 64

    lea rbx, [rsp + 16]
    mov esi, 48
    mov rdi, rbx
    call qword ptr [rip + getrandom::imp::getrandom_inner@GOTPCREL]

    test eax, eax
    jne .LBB18_1

    mov qword ptr [rsp], rbx
    mov qword ptr [rsp + 8], 48
    mov rax, rsp

    add rsp, 64
    pop rbx
    ret

As the benchmarks are compiled as separate crates, we can see the effect of inlining. Removing the #[inline] from getrandom_uninit() gives:

buffer::p384::bench_getrandom_uninit::inner:
    push rbx
    sub rsp, 80
    
    lea rbx, [rsp + 16]
    lea rsi, [rsp + 32]
    mov edx, 48
    mov rdi, rbx
    call qword ptr [rip + getrandom::getrandom_uninit@GOTPCREL]

    mov rax, qword ptr [rsp + 16]
    test rax, rax
    je .LBB18_1

    mov rcx, qword ptr [rsp + 24]
    mov qword ptr [rsp + 16], rax
    mov qword ptr [rsp + 24], rcx

    add rsp, 80
    pop rbx
    ret

We can also see that passing the entire array to test::black_box creates an additional copy:

buffer::p384::bench_getrandom_uninit::inner:
    sub rsp, 104

    lea rdi, [rsp + 56]
    mov esi, 48
    call qword ptr [rip + getrandom::imp::getrandom_inner@GOTPCREL]

    test eax, eax
    jne .LBB18_1

    ; 48 byte copy
    movups xmm0, xmmword ptr [rsp + 56]
    movups xmm1, xmmword ptr [rsp + 72]
    movups xmm2, xmmword ptr [rsp + 88]
    movaps xmmword ptr [rsp + 32], xmm2
    movaps xmmword ptr [rsp + 16], xmm1
    movaps xmmword ptr [rsp], xmm0
    mov rax, rsp

    add rsp, 104
    ret

@briansmith this relates to #291 (comment) about how the type you pass to black_box affects the code generation, so when comparing implementations, we should be sure to use the same types here.

This change: - Move the benchmarks from mod.rs to buffer.rs - Move the inner loop we benchmark into an `#[inline(never)]` function - Includes instructions for getting the ASM for a specific benchmark This should hopefully reduce the variance of these benchmarks and make it easier to figure out if we are emitting the assembly or IR we expect for a particular implementation. Signed-off-by: Joe Richey <joerichey@google.com>

briansmith · 2022-10-21T17:02:21Z

No major objections from me.

Passes a &[u8] to test::black_box for both benchmarks

I think most users don't really want a &[u8] but rather want a [u8; N] or even a stronger type (see the getrandom_array discussion). The getrandom benchmark does produce an [u8; N] but the getrandom_uninit benchmark never does. It is good that you got rid of the extra copy caused by passing the entire array to black_box but it would be better to pass a &[u8; N] instead. In reality the assume_init() call that would be required is almost definitely going to get optimized away; however, according to bug reports from last year that wasn't always the case.

* Rename benches/mod.rs to benches/buffer.rs This naming makes more sense, especially if we add more benchmark files. Signed-off-by: Joe Richey <joerichey@google.com> * Rework benchmarks to make it easier to get assembly. This change: - Move the benchmarks from mod.rs to buffer.rs - Move the inner loop we benchmark into an `#[inline(never)]` function - Includes instructions for getting the ASM for a specific benchmark This should hopefully reduce the variance of these benchmarks and make it easier to figure out if we are emitting the assembly or IR we expect for a particular implementation. Signed-off-by: Joe Richey <joerichey@google.com> Signed-off-by: Joe Richey <joerichey@google.com>

josephlr requested a review from newpavlov October 21, 2022 07:21

Rename benches/mod.rs to benches/buffer.rs

359b978

This naming makes more sense, esspecially if we add more benchmark files. Signed-off-by: Joe Richey <joerichey@google.com>

josephlr force-pushed the bench branch from f1613f8 to c5c58c2 Compare October 21, 2022 07:26

josephlr force-pushed the bench branch from c5c58c2 to 9ac388e Compare October 21, 2022 07:59

newpavlov approved these changes Oct 21, 2022

View reviewed changes

newpavlov merged commit bd0654f into master Oct 21, 2022

newpavlov deleted the bench branch October 21, 2022 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework benchmarks to make it easier to get assembly.#297

Rework benchmarks to make it easier to get assembly.#297
newpavlov merged 2 commits intomasterfrom
bench

josephlr commented Oct 21, 2022 •

edited

Loading

Uh oh!

josephlr commented Oct 21, 2022 •

edited

Loading

Uh oh!

briansmith commented Oct 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

josephlr commented Oct 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josephlr commented Oct 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

briansmith commented Oct 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

josephlr commented Oct 21, 2022 •

edited

Loading

josephlr commented Oct 21, 2022 •

edited

Loading