use a loop when elt_multiply has two var inputs by SteveBronder · Pull Request #2159 · stan-dev/math

SteveBronder · 2020-10-23T00:09:48Z

Summary

This uses a loop in elt_multiply in the case of two vars instead of two eigen calls for the var case and a loop for the mat case that pulls out the result's adjoint. I found this to be a bit faster when comparing against an scalar_binary_apply() version

So apply is still better at the small N case but compared to the top right graph from here the loop version is much nicer

Tests

No new tests

Side Effects

Nope

Release notes

Checklist

Math issue #(issue number)
Copyright holder: Steve Bronder

The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
- unit tests pass (to run, use: ./runTests.py test/unit)
- header checks pass, (make test-headers)
- dependencies checks pass, (make test-math-dependencies)
- docs build, (make doxygen)
- code passes the built in C++ standards checks (make cpplint)
the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested

stan-buildbot · 2020-10-23T14:38:39Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.13	3.14	1.0	-0.49% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	0.99	-1.31% slower
eight_schools/eight_schools.stan	0.12	0.12	1.0	0.27% faster
gp_regr/gp_regr.stan	0.18	0.18	1.01	1.24% faster
irt_2pl/irt_2pl.stan	5.72	5.7	1.0	0.35% faster
performance.compilation	90.93	88.81	1.02	2.33% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.43	8.5	0.99	-0.83% slower
pkpd/one_comp_mm_elim_abs.stan	29.68	29.92	0.99	-0.8% slower
sir/sir.stan	132.43	134.19	0.99	-1.32% slower
gp_regr/gen_gp_data.stan	0.04	0.04	1.0	-0.35% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.96	2.96	1.0	0.01% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.38	0.37	1.01	1.38% faster
arK/arK.stan	1.77	1.8	0.98	-1.91% slower
arma/arma.stan	0.61	0.62	0.99	-0.84% slower
garch/garch.stan	0.7	0.7	1.0	-0.3% slower
Mean result: 0.998418652363

Jenkins Console Log
Blue Ocean
Commit hash: 1eb37c1

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

bbbales2

Good.

use a loop when elt_multiply has two var inputs

1eb37c1

bbbales2 approved these changes Oct 28, 2020

View reviewed changes

bbbales2 merged commit 34d4951 into develop Oct 28, 2020

rok-cesnovar deleted the perf-fix/elt-multiply-loop branch October 28, 2020 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use a loop when elt_multiply has two var inputs#2159

use a loop when elt_multiply has two var inputs#2159
bbbales2 merged 1 commit into
developfrom
perf-fix/elt-multiply-loop

SteveBronder commented Oct 23, 2020

Uh oh!

stan-buildbot commented Oct 23, 2020

Uh oh!

bbbales2 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

SteveBronder commented Oct 23, 2020

Summary

Tests

Side Effects

Release notes

Checklist

Uh oh!

stan-buildbot commented Oct 23, 2020

Uh oh!

bbbales2 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants