Skip to content

use a loop when elt_multiply has two var inputs#2159

Merged
bbbales2 merged 1 commit into
developfrom
perf-fix/elt-multiply-loop
Oct 28, 2020
Merged

use a loop when elt_multiply has two var inputs#2159
bbbales2 merged 1 commit into
developfrom
perf-fix/elt-multiply-loop

Conversation

@SteveBronder
Copy link
Copy Markdown
Collaborator

Summary

This uses a loop in elt_multiply in the case of two vars instead of two eigen calls for the var case and a loop for the mat case that pulls out the result's adjoint. I found this to be a bit faster when comparing against an scalar_binary_apply() version

image

image

So apply is still better at the small N case but compared to the top right graph from here the loop version is much nicer

Tests

No new tests

Side Effects

Nope

Release notes

Checklist

  • Math issue #(issue number)

  • Copyright holder: Steve Bronder

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@stan-buildbot
Copy link
Copy Markdown
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.13 3.14 1.0 -0.49% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -1.31% slower
eight_schools/eight_schools.stan 0.12 0.12 1.0 0.27% faster
gp_regr/gp_regr.stan 0.18 0.18 1.01 1.24% faster
irt_2pl/irt_2pl.stan 5.72 5.7 1.0 0.35% faster
performance.compilation 90.93 88.81 1.02 2.33% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.43 8.5 0.99 -0.83% slower
pkpd/one_comp_mm_elim_abs.stan 29.68 29.92 0.99 -0.8% slower
sir/sir.stan 132.43 134.19 0.99 -1.32% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 -0.35% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.96 2.96 1.0 0.01% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.37 1.01 1.38% faster
arK/arK.stan 1.77 1.8 0.98 -1.91% slower
arma/arma.stan 0.61 0.62 0.99 -0.84% slower
garch/garch.stan 0.7 0.7 1.0 -0.3% slower
Mean result: 0.998418652363

Jenkins Console Log
Blue Ocean
Commit hash: 1eb37c1


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Copy Markdown
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@bbbales2 bbbales2 merged commit 34d4951 into develop Oct 28, 2020
@rok-cesnovar rok-cesnovar deleted the perf-fix/elt-multiply-loop branch October 28, 2020 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants