Skip to content

ARROW-4827: [C++] Implement benchmark comparison#4141

Closed
fsaintjacques wants to merge 31 commits into
apache:masterfrom
fsaintjacques:ARROW-4827-benchmark-comparison
Closed

ARROW-4827: [C++] Implement benchmark comparison#4141
fsaintjacques wants to merge 31 commits into
apache:masterfrom
fsaintjacques:ARROW-4827-benchmark-comparison

Conversation

@fsaintjacques

Copy link
Copy Markdown
Contributor

This script/library allows comparing revisions/builds.

@bkietz bkietz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, looks lovely

Comment thread docs/source/developers/benchmarks.rst Outdated
Comment thread docs/source/developers/benchmarks.rst Outdated
Comment thread docs/source/developers/benchmarks.rst Outdated
Comment thread docs/source/developers/benchmarks.rst Outdated
Comment thread docs/source/developers/benchmarks.rst Outdated
Comment thread dev/archery/archery/cli.py Outdated
Comment thread dev/archery/archery/cli.py Outdated
Comment thread dev/archery/archery/cli.py Outdated
Comment thread dev/archery/archery/utils/command.py Outdated
Comment thread dev/archery/archery/benchmark/google.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be useful to provide some progress output as each test is run so users know nothing is hung.

Maybe benchmarks could be run one at a time with messages naming each?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to commit, but it would requires some more thinking:

  1. Rework how to capture results from google benchmark (right now from stdout). We can use --benchmark_output, then we'll get "progress" in stdout.

  2. archery stdout is now clobbered with this result, so either we redirect the previous point output into stderr, or into the logger.

I'm not very satisfied with either answer. Note that in all cases, you can get some feedback with --debug.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to do it would be:

Suggested change
@property
def suite_name(self):
return os.path.splitext(os.path.basename(self.bin))[0]
def results(self):
argv = ["--benchmark_format=json", "--benchmark_repetitions=20"]
results = { "benchmarks": [] }
for name in self.list_benchmarks():
print(f"running {self.suite_name}.{name}")
result = json.loads(self.run(*argv, f"--benchmark_filter={name}",
stdout=subprocess.PIPE,
stderr=subprocess.PIPE).stdout)
results["context"] = result["context"]
results["benchmarks"] += result["benchmarks"]
return results

re stdout clobbering: the output already seems clobbered by things like 'ninja: no work to do.`

Maybe it would be better to provide the option to specify filenames for comparison (and/or benchmark) output json, rather than rely on stdio?

Comment thread dev/archery/archery/cli.py Outdated
Comment thread docs/source/developers/benchmarks.rst Outdated
Comment thread dev/archery/archery/benchmark/google.py Outdated
@kszucs

kszucs commented Apr 15, 2019

Copy link
Copy Markdown
Member

@fsaintjacques is it still WIP?

@fsaintjacques fsaintjacques marked this pull request as ready for review April 15, 2019 14:44
@fsaintjacques fsaintjacques changed the title [WIP] ARROW-4827: [C++] Implement benchmark comparison ARROW-4827: [C++] Implement benchmark comparison Apr 15, 2019
@fsaintjacques

Copy link
Copy Markdown
Contributor Author

@kszucs not anymore!

@kszucs

kszucs commented Apr 15, 2019

Copy link
Copy Markdown
Member

@fsaintjacques please resolve the conflict

@fsaintjacques fsaintjacques force-pushed the ARROW-4827-benchmark-comparison branch from 24fc1dc to 512ae64 Compare April 15, 2019 17:03
@codecov-io

codecov-io commented Apr 16, 2019

Copy link
Copy Markdown

Codecov Report

Merging #4141 into master will increase coverage by 1.42%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4141      +/-   ##
==========================================
+ Coverage   87.76%   89.18%   +1.42%     
==========================================
  Files         758      617     -141     
  Lines       92231    82202   -10029     
  Branches     1251        0    -1251     
==========================================
- Hits        80944    73310    -7634     
+ Misses      11166     8892    -2274     
+ Partials      121        0     -121
Impacted Files Coverage Δ
cpp/src/arrow/python/common.h 98.78% <0%> (-1.22%) ⬇️
python/pyarrow/parquet.py 92.34% <0%> (-1.14%) ⬇️
cpp/src/arrow/util/thread-pool-test.cc 97.66% <0%> (-0.94%) ⬇️
python/pyarrow/compat.py 90% <0%> (-0.48%) ⬇️
python/pyarrow/tests/test_table.py 99.61% <0%> (-0.39%) ⬇️
python/pyarrow/_parquet.pyx 90.5% <0%> (-0.39%) ⬇️
cpp/src/arrow/python/io.cc 95.38% <0%> (-0.38%) ⬇️
cpp/src/parquet/arrow/arrow-reader-writer-test.cc 95.31% <0%> (-0.17%) ⬇️
cpp/src/arrow/python/flight.cc 0.79% <0%> (-0.09%) ⬇️
python/pyarrow/_plasma.pyx 91.27% <0%> (-0.06%) ⬇️
... and 164 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70813d7...2a953f1. Read the comment docs.

@fsaintjacques fsaintjacques force-pushed the ARROW-4827-benchmark-comparison branch 6 times, most recently from 4b2f180 to d27160d Compare April 18, 2019 15:26
@fsaintjacques fsaintjacques force-pushed the ARROW-4827-benchmark-comparison branch from d27160d to c371921 Compare April 18, 2019 18:02
@kszucs

kszucs commented Apr 19, 2019

Copy link
Copy Markdown
Member

I won't be to pedantic about this, because it looks good in general, but hard to predict the arising problems without actually running and using it. I'll merge after a positive attempt to try it.
If You have any follow-up tasks please ensure to create tickets.

@fsaintjacques

Copy link
Copy Markdown
Contributor Author

Please be pedantic, I'm not familiar with python's best practices. I just followed your style in ursabot/crossbow.

@pitrou pitrou self-requested a review April 24, 2019 12:24

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks basically sound. Here are some comments, you may not necessarily want to act on all of them.

Comment thread .gitignore Outdated
Comment thread dev/archery/archery/benchmark/core.py Outdated
Comment thread dev/archery/archery/benchmark/core.py Outdated
return f"BenchmarkSuite[name={name}, benchmarks={benchmarks}]"


def regress(change, threshold):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, I would probably expect a Benchmark.does_regress(baseline) method (that could ultimately take into account the standard deviation and the less_is_better property). Of course, that can be later refactored.

Comment thread dev/archery/archery/benchmark/core.py Outdated
n = len(values)
mean = sum(values) / len(values)
sum_diff = sum([(val - mean)**2 for val in values])
stddev = (sum_diff / (n - 1))**0.5 if n > 1 else 0.0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, since you're requiring Python 3 (I saw some f-strings), you should be aware that Python now has a simple statistics module in its standard library.

Though it doesn't support arbitrary quantiles (there's an issue open for that : https://bugs.python.org/issue35775)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped it (locally, going to update) in favor of using panda, do you think it's overkill to import it as a library? I think it's going to be useful one day or the other.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas sounds overkill for this, since you're dealing with arrays. Numpy would be enough.

Comment thread dev/archery/archery/benchmark/core.py Outdated
return float(new - old) / abs(old)


DEFAULT_THRESHOLD = 0.05

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this? Add a comment?

Comment thread dev/archery/archery/utils/git.py Outdated
Comment thread dev/archery/archery/utils/logger.py Outdated
Comment thread dev/archery/setup.py Outdated
Comment thread dev/archery/archery/cli.py Outdated
Comment thread docs/source/developers/benchmarks.rst
@pitrou

pitrou commented Apr 24, 2019

Copy link
Copy Markdown
Member

As a side note, at some point you'll probably want to run flake8 (see the Travis lint script) on archery.

@fsaintjacques

Copy link
Copy Markdown
Contributor Author

@pitrou updated with your comments, flake8 should pass soon.

@pitrou

pitrou commented Apr 25, 2019

Copy link
Copy Markdown
Member

Thanks. I think the CI failure is unrelated.

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I trust that you acted on previous review comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants