[PROPOSAL] Add Performance Testing + SDK Span Performance Tests#1443
Conversation
0afd9ba to
6e5d7f1
Compare
6e5d7f1 to
fc0f2dc
Compare
|
@NathanielRN came across this https://pypi.org/project/scalene/, might be helpful for cpu and memory profiling. |
725b632 to
92099a8
Compare
Performance testing! This is outstanding 😎. I have one conceptual question (it very possibly may be related to the What does performance mean in:
Is performance measured in seconds? For example, a performance test is run once and it takes 10 seconds, so its performance is 10s? As far as I understand, a regression of less than 75% is accepted. So, if a test case takes 100s to run, it is accepted if it is run again and it takes 174s? Does that mean that it is acceptable that the performance can continue to degrade more and more as long as it is less than 75% of the previous run? |
|
@ocelotl That's a great question!
No, it compares against some baseline performance. In this PR I committed (e.g. This becomes the expected performance for the package. On subsequent PRs,
According to the pytest-benchmark docs tests are run for 1 second, as many times as that test can be run in that second:
And the units seem to be in seconds.
So a |
0aab259 to
488b4e7
Compare
|
@lonewolf3739 Thanks for the suggestion! Will be looking into this 😄 |
aabmass
left a comment
There was a problem hiding this comment.
I think we should only be updating these saved benchmark files from Github action CI runs somehow, or maybe storing them as artifacts. The results could vary a lot depending on the PR author's hardware. If they run on Github Actions it should be somewhat normalized and wouldn't prevent people from using Mac/Windows for development, or force them to install all 5 versions of python we support to regen the benchmarks.
That might be a bit tricky, what is the Java SIG doing for their JMH benchmarks?
| @@ -0,0 +1,104 @@ | |||
| { | |||
There was a problem hiding this comment.
Should probably add a linguist-generated entry for these so to keep PR noise down?
There was a problem hiding this comment.
That's really cool! I didn't know about this 😄
However I think it's only big this time, the benefit of having it included in the diff is that when someone updates the performance later, only the numbers should change (which means we can see how the performance changes in the git diff)
Also if you add new performance tests, that will be clearly seen as only a few lines of code added to these files.
Let me know what you think?
There was a problem hiding this comment.
Oh I see, it always updates the same file. I was looking at the pytest-benchmark docs here with --benchmark-autosave. Is it worth preserving all previous runs? They show some cool box and whisker plots from previously saved runs. We can always iterate 😄
There was a problem hiding this comment.
It's a good idea! For now we will have a history of the mean throughput in benchmarks/data.js for all commits, but if someone wants to generate the cool whisker plots all they would have to do is clone the repo and run tox -e tests-core-sdk -- --benchmark-histogram=<path> to generate the .json.
But I think it would be cool to have just the latest .json file committed somewhere here on the repo later :)
tox.ini
Outdated
| test: pytest {posargs} | ||
| test: pytest --benchmark-save=now --benchmark-compare --benchmark-compare-fail=min:75% --benchmark-sort=mean --benchmark-columns=mean,min,max,stddev,median,ops,rounds {posargs} | ||
|
|
||
| core-sdk: {toxinidir}/tests/print_benchmarks.py {toxinidir}/opentelemetry-sdk |
There was a problem hiding this comment.
I'm not sure I follow what this is for? Doesn't already print when you run pytest in a nice colored format
There was a problem hiding this comment.
It prints the table which is a nice visual format. However, we want to see the .json file produced because pytest-benchmark can only run comparisons between .json files, not tables.
So the .json file is printed to the workflow logs so that the author can see the performance results of their changes according to the normalize standard of "running on GitHub actions".
And if they improved performance they would update the relevant files in 0001_old.json for each python version 😄 (This is the only manual process of this 😕 )
There was a problem hiding this comment.
Close, they would copy/paste the output JSON into the opentelemetry-sdk/tests/performance/.benchmarks/Linux-CPython-3.5-64bit/0001_old.json file for example.
This is only if they want to change the expected performance standard, which is something we could automate to happen every "release" for example.
The intention is to only update form the Github action CI 🙂 Short of committing the actual results into the PRs files (I don't know how to do that and I don't think we should...) the the next best solution I thought of was to actually output the results in the GitHub Action CI logs. That's where the
Author's will never have to commit results from their machine, they will use the GitHub actions machine results. We'll know it came from GitHub because Compare this to a run on my local machine which has
Agreed!
Yes also agreed :) I tested locally and you can still run
The Java SIG seems to be adding the performance test results in the PR description: See PR 2186. And I think I heard they post them in some When considering their solution, I thought this solution here where we commit them as |
30bbf14 to
93885b3
Compare
opentelemetry-sdk/tests/performance/benchmarks/trace/test_benchmark_trace.py
Show resolved
Hide resolved
aabmass
left a comment
There was a problem hiding this comment.
@NathanielRN why aren't we seeing the run on this PR? Is it because there is no existing saved benchmarks, but it will work going forward?
| Performance progression of benchmarks for packages distributed by OpenTelemetry Python can be viewed as a [graph of throughput vs commit history](https://open-telemetry.github.io/opentelemetry-python/benchmarks/index.html). From this page, you can download a JSON file with the performance results. | ||
|
|
||
| Running the `tox` tests also runs the performance tests if any are available. Benchmarking tests are done with `pytest-benchmark` and they output a table with results to the console. | ||
|
|
There was a problem hiding this comment.
@NathanielRN this will show the perf change on your own machine between runs, right?
Do the JSON files need to be added to .gitignore for this PR?
Yes exactly that is what I expect, on my test repo I did not see the comments until I had something on |
de3277a to
86b642e
Compare
lzchen
left a comment
There was a problem hiding this comment.
Niiiiiiiiiiiiiiiiiiiiice.
As discussed with @NathanielRN , since readthedocs builds from master, this github action will also write to master in the benchmarks folder for every build.
ocelotl
left a comment
There was a problem hiding this comment.
Just a question regarding specification compliance 👍
opentelemetry-sdk/tests/performance/benchmarks/trace/test_benchmark_trace.py
Show resolved
Hide resolved
577533b to
22d48fa
Compare
Co-authored-by: Aaron Abbott <aaronabbott@google.com>
22d48fa to
ea7643c
Compare
Description
Part of #335, this PR aims to include the
pytest-benchmarklibrary to get micro benching results on some key SDK operations.Specifically, this follows the OpenTelemetry Specifications on Performance Benchmarking guidelines.
Regarding
pytest-benchmark:Pros:
pytestgh-pages, people can create these or we can create these!)Cons:
Type of change
In this PR:
tox.inito print the tests using the addedtests/print_benchmarks.pyscriptgh-pagesbranch and at the link https://open-telemetry.github.io/opentelemetry-python/benchmarks/index.htmlHow Has This Been Tested?
Locally
Running
tox -e test-core-sdkproduces the following table:Example Repo
I made an example repo here to prove the json merging works
You can see the difference between PRs that have different performance benchmarks:
This combines all the tests under
OpenTelemetry Python Benchmarkas seen here. The fluctuations are on purpose because I made the test take longer.Does This PR Require a Contrib Repo Change?
Checklist:
- [] Changelogs have been updated(Not changing functionality, only adding tests- [ ] Unit tests have been added- [ ] Documentation has been updated