ARROW-4827: [C++] Implement benchmark comparison#4141
Conversation
There was a problem hiding this comment.
It'd be useful to provide some progress output as each test is run so users know nothing is hung.
Maybe benchmarks could be run one at a time with messages naming each?
There was a problem hiding this comment.
Feel free to commit, but it would requires some more thinking:
-
Rework how to capture results from google benchmark (right now from stdout). We can use
--benchmark_output, then we'll get "progress" in stdout. -
archerystdout is now clobbered with this result, so either we redirect the previous point output into stderr, or into the logger.
I'm not very satisfied with either answer. Note that in all cases, you can get some feedback with --debug.
There was a problem hiding this comment.
One way to do it would be:
| @property | |
| def suite_name(self): | |
| return os.path.splitext(os.path.basename(self.bin))[0] | |
| def results(self): | |
| argv = ["--benchmark_format=json", "--benchmark_repetitions=20"] | |
| results = { "benchmarks": [] } | |
| for name in self.list_benchmarks(): | |
| print(f"running {self.suite_name}.{name}") | |
| result = json.loads(self.run(*argv, f"--benchmark_filter={name}", | |
| stdout=subprocess.PIPE, | |
| stderr=subprocess.PIPE).stdout) | |
| results["context"] = result["context"] | |
| results["benchmarks"] += result["benchmarks"] | |
| return results |
re stdout clobbering: the output already seems clobbered by things like 'ninja: no work to do.`
Maybe it would be better to provide the option to specify filenames for comparison (and/or benchmark) output json, rather than rely on stdio?
|
@fsaintjacques is it still WIP? |
|
@kszucs not anymore! |
|
@fsaintjacques please resolve the conflict |
24fc1dc to
512ae64
Compare
Codecov Report
@@ Coverage Diff @@
## master #4141 +/- ##
==========================================
+ Coverage 87.76% 89.18% +1.42%
==========================================
Files 758 617 -141
Lines 92231 82202 -10029
Branches 1251 0 -1251
==========================================
- Hits 80944 73310 -7634
+ Misses 11166 8892 -2274
+ Partials 121 0 -121
Continue to review full report at Codecov.
|
4b2f180 to
d27160d
Compare
d27160d to
c371921
Compare
|
I won't be to pedantic about this, because it looks good in general, but hard to predict the arising problems without actually running and using it. I'll merge after a positive attempt to try it. |
|
Please be pedantic, I'm not familiar with python's best practices. I just followed your style in ursabot/crossbow. |
pitrou
left a comment
There was a problem hiding this comment.
This looks basically sound. Here are some comments, you may not necessarily want to act on all of them.
| return f"BenchmarkSuite[name={name}, benchmarks={benchmarks}]" | ||
|
|
||
|
|
||
| def regress(change, threshold): |
There was a problem hiding this comment.
Instead of this, I would probably expect a Benchmark.does_regress(baseline) method (that could ultimately take into account the standard deviation and the less_is_better property). Of course, that can be later refactored.
| n = len(values) | ||
| mean = sum(values) / len(values) | ||
| sum_diff = sum([(val - mean)**2 for val in values]) | ||
| stddev = (sum_diff / (n - 1))**0.5 if n > 1 else 0.0 |
There was a problem hiding this comment.
btw, since you're requiring Python 3 (I saw some f-strings), you should be aware that Python now has a simple statistics module in its standard library.
Though it doesn't support arbitrary quantiles (there's an issue open for that : https://bugs.python.org/issue35775)
There was a problem hiding this comment.
I dropped it (locally, going to update) in favor of using panda, do you think it's overkill to import it as a library? I think it's going to be useful one day or the other.
There was a problem hiding this comment.
Pandas sounds overkill for this, since you're dealing with arrays. Numpy would be enough.
| return float(new - old) / abs(old) | ||
|
|
||
|
|
||
| DEFAULT_THRESHOLD = 0.05 |
|
As a side note, at some point you'll probably want to run |
|
@pitrou updated with your comments, flake8 should pass soon. |
|
Thanks. I think the CI failure is unrelated. |
pitrou
left a comment
There was a problem hiding this comment.
+1. I trust that you acted on previous review comments.
This script/library allows comparing revisions/builds.