ARROW-4827: [C++] Implement benchmark comparison by fsaintjacques · Pull Request #4141 · apache/arrow

fsaintjacques · 2019-04-11T15:52:15Z

This script/library allows comparing revisions/builds.

bkietz

minor comments, looks lovely

bkietz · 2019-04-11T17:30:53Z

It'd be useful to provide some progress output as each test is run so users know nothing is hung.

Maybe benchmarks could be run one at a time with messages naming each?

Feel free to commit, but it would requires some more thinking:

Rework how to capture results from google benchmark (right now from stdout). We can use --benchmark_output, then we'll get "progress" in stdout.

archery stdout is now clobbered with this result, so either we redirect the previous point output into stderr, or into the logger.

I'm not very satisfied with either answer. Note that in all cases, you can get some feedback with --debug.

One way to do it would be:

Suggested change

@property

def suite_name(self):

return os.path.splitext(os.path.basename(self.bin))[0]

def results(self):

argv = ["--benchmark_format=json", "--benchmark_repetitions=20"]

results = { "benchmarks": [] }

for name in self.list_benchmarks():

print(f"running {self.suite_name}.{name}")

result = json.loads(self.run(*argv, f"--benchmark_filter={name}",

stdout=subprocess.PIPE,

stderr=subprocess.PIPE).stdout)

results["context"] = result["context"]

results["benchmarks"] += result["benchmarks"]

return results

re stdout clobbering: the output already seems clobbered by things like 'ninja: no work to do.`

Maybe it would be better to provide the option to specify filenames for comparison (and/or benchmark) output json, rather than rely on stdio?

kszucs · 2019-04-15T13:02:09Z

@fsaintjacques is it still WIP?

fsaintjacques · 2019-04-15T14:45:16Z

@kszucs not anymore!

kszucs · 2019-04-15T15:23:40Z

@fsaintjacques please resolve the conflict

codecov-io · 2019-04-16T19:39:44Z

Codecov Report

Merging #4141 into master will increase coverage by 1.42%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4141      +/-   ##
==========================================
+ Coverage   87.76%   89.18%   +1.42%     
==========================================
  Files         758      617     -141     
  Lines       92231    82202   -10029     
  Branches     1251        0    -1251     
==========================================
- Hits        80944    73310    -7634     
+ Misses      11166     8892    -2274     
+ Partials      121        0     -121

Impacted Files	Coverage Δ
cpp/src/arrow/python/common.h	`98.78% <0%> (-1.22%)`	⬇️
python/pyarrow/parquet.py	`92.34% <0%> (-1.14%)`	⬇️
cpp/src/arrow/util/thread-pool-test.cc	`97.66% <0%> (-0.94%)`	⬇️
python/pyarrow/compat.py	`90% <0%> (-0.48%)`	⬇️
python/pyarrow/tests/test_table.py	`99.61% <0%> (-0.39%)`	⬇️
python/pyarrow/_parquet.pyx	`90.5% <0%> (-0.39%)`	⬇️
cpp/src/arrow/python/io.cc	`95.38% <0%> (-0.38%)`	⬇️
cpp/src/parquet/arrow/arrow-reader-writer-test.cc	`95.31% <0%> (-0.17%)`	⬇️
cpp/src/arrow/python/flight.cc	`0.79% <0%> (-0.09%)`	⬇️
python/pyarrow/_plasma.pyx	`91.27% <0%> (-0.06%)`	⬇️
... and 164 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70813d7...2a953f1. Read the comment docs.

kszucs · 2019-04-19T13:09:24Z

I won't be to pedantic about this, because it looks good in general, but hard to predict the arising problems without actually running and using it. I'll merge after a positive attempt to try it.
If You have any follow-up tasks please ensure to create tickets.

fsaintjacques · 2019-04-19T13:16:14Z

Please be pedantic, I'm not familiar with python's best practices. I just followed your style in ursabot/crossbow.

pitrou

This looks basically sound. Here are some comments, you may not necessarily want to act on all of them.

pitrou · 2019-04-24T14:07:40Z

+        return f"BenchmarkSuite[name={name}, benchmarks={benchmarks}]"
+
+
+def regress(change, threshold):


Instead of this, I would probably expect a Benchmark.does_regress(baseline) method (that could ultimately take into account the standard deviation and the less_is_better property). Of course, that can be later refactored.

pitrou · 2019-04-24T14:09:30Z

+        n = len(values)
+        mean = sum(values) / len(values)
+        sum_diff = sum([(val - mean)**2 for val in values])
+        stddev = (sum_diff / (n - 1))**0.5 if n > 1 else 0.0


btw, since you're requiring Python 3 (I saw some f-strings), you should be aware that Python now has a simple statistics module in its standard library.

Though it doesn't support arbitrary quantiles (there's an issue open for that : https://bugs.python.org/issue35775)

I dropped it (locally, going to update) in favor of using panda, do you think it's overkill to import it as a library? I think it's going to be useful one day or the other.

Pandas sounds overkill for this, since you're dealing with arrays. Numpy would be enough.

pitrou · 2019-04-24T14:11:50Z

+    return float(new - old) / abs(old)
+
+
+DEFAULT_THRESHOLD = 0.05


What's this? Add a comment?

pitrou · 2019-04-24T14:59:18Z

As a side note, at some point you'll probably want to run flake8 (see the Travis lint script) on archery.

fsaintjacques · 2019-04-25T14:08:30Z

@pitrou updated with your comments, flake8 should pass soon.

pitrou · 2019-04-25T15:48:22Z

Thanks. I think the CI failure is unrelated.

pitrou

+1. I trust that you acted on previous review comments.

bkietz approved these changes Apr 11, 2019

View reviewed changes

bkietz reviewed Apr 12, 2019

View reviewed changes

Comment thread dev/archery/archery/cli.py Outdated

bkietz reviewed Apr 12, 2019

View reviewed changes

Comment thread docs/source/developers/benchmarks.rst Outdated

bkietz reviewed Apr 12, 2019

View reviewed changes

Comment thread dev/archery/archery/benchmark/google.py Outdated

fsaintjacques mentioned this pull request Apr 15, 2019

ARROW-5071: [C++] CMake benchmark wrapper #4077

Closed

fsaintjacques marked this pull request as ready for review April 15, 2019 14:44

fsaintjacques changed the title ~~[WIP] ARROW-4827: [C++] Implement benchmark comparison~~ ARROW-4827: [C++] Implement benchmark comparison Apr 15, 2019

fsaintjacques force-pushed the ARROW-4827-benchmark-comparison branch from 24fc1dc to 512ae64 Compare April 15, 2019 17:03

fsaintjacques added 18 commits April 18, 2019 08:06

initial commit

712d2ed

Fix syntax

a5ad76d

checkpoint

a38f49c

Checkpoint

2c0d512

commit

703cf98

Add documentation

c85661c

Ooops.

2a81744

Add doc and fix bugs

21b2e14

Formatting

d6733b6

Removes copied stuff

bc111b2

Rename --cxx_flags to --cxx-flags

1b02839

Various language fixes

a281ae8

Add doc for bin attribute.

7696202

Add --cmake-extras to build command

90578af

Fix splitlines

d9692bc

Add gitignore entry

96f9997

Supports HEAD revisions

1949f74

Remove empty __init__.py

8845e3e

fsaintjacques force-pushed the ARROW-4827-benchmark-comparison branch 6 times, most recently from 4b2f180 to d27160d Compare April 18, 2019 15:26

Fix flake8 warnings

c371921

fsaintjacques force-pushed the ARROW-4827-benchmark-comparison branch from d27160d to c371921 Compare April 18, 2019 18:02

fsaintjacques added 3 commits April 18, 2019 14:11

Disable python in benchmarks

71b10e9

Add verbose_third_party

048ba0e

Typo

e676289

fsaintjacques added 2 commits April 22, 2019 08:53

Add --cmake-extras to benchmark-diff command

2825467

Support conda toolchain

dc031bd

pitrou self-requested a review April 24, 2019 12:24

pitrou reviewed Apr 24, 2019

View reviewed changes

fsaintjacques added 7 commits April 24, 2019 11:09

Update gitignore

280c93b

Introduce RegressionSetArgs

514e8e4

Review

d8e3c1c

Missing files

2a953f1

Move cpp_runner_from_rev_or_path in CppRunner

ee39a1f

Add comments and move stuff

e95baf3

Satisfy flake8

a047ae4

pitrou approved these changes Apr 25, 2019

View reviewed changes

pitrou closed this in c3511db Apr 25, 2019

asfimport mentioned this pull request Apr 25, 2019

[C++] Implement benchmark comparison between two git revisions #21343

Closed

+    @property
+    def suite_name(self):
+        return os.path.splitext(os.path.basename(self.bin))[0]
+    def results(self):
+        argv = ["--benchmark_format=json", "--benchmark_repetitions=20"]
+        results = { "benchmarks": [] }
+        for name in self.list_benchmarks():
+            print(f"running {self.suite_name}.{name}")
+            result = json.loads(self.run(*argv, f"--benchmark_filter={name}",
+                                         stdout=subprocess.PIPE,
+                                         stderr=subprocess.PIPE).stdout)
+            results["context"] = result["context"]
+            results["benchmarks"] += result["benchmarks"]
+        return results

		return f"BenchmarkSuite[name={name}, benchmarks={benchmarks}]"


		def regress(change, threshold):

Uh oh!

Conversation

fsaintjacques commented Apr 11, 2019

Uh oh!

bkietz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kszucs commented Apr 15, 2019

Uh oh!

fsaintjacques commented Apr 15, 2019

Uh oh!

kszucs commented Apr 15, 2019

Uh oh!

codecov-io commented Apr 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kszucs commented Apr 19, 2019

Uh oh!

fsaintjacques commented Apr 19, 2019

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pitrou commented Apr 24, 2019

Uh oh!

fsaintjacques commented Apr 25, 2019

Uh oh!

pitrou commented Apr 25, 2019

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-io commented Apr 16, 2019 •

edited

Loading