Skip to content

ARROW-5269: [C++][Archery] Mark relevant benchmarks as regression#4285

Closed
fsaintjacques wants to merge 14 commits into
apache:masterfrom
fsaintjacques:ARROW-5269-regression-candidates
Closed

ARROW-5269: [C++][Archery] Mark relevant benchmarks as regression#4285
fsaintjacques wants to merge 14 commits into
apache:masterfrom
fsaintjacques:ARROW-5269-regression-candidates

Conversation

@fsaintjacques

@fsaintjacques fsaintjacques commented May 9, 2019

Copy link
Copy Markdown
Contributor

The goal of this change is to mark benchmarks candidate for automated regression checks. Some benchmarks were refactored for various reasons:

  • Code deduplication where needed
  • Uniform input size for benchmarks in the same suite, usually to minimize the effect of cache hierarchy.
  • Favor external repetitions over manual repetitions and mintime when possible.
  • Add cmake ARROW_BUILD_BENCHMARKS_REFERENCE to toggle reference benchmarks.
  • Remove default benchmark filter of ^Regression.
  • Remove BM_ prefix from benchmark names
  • Fixes and addons to archery to support the --pdb option for debugging and add support for benchmarks reported as items_per_seconds.
  • Adds archery benchmark list sub-command to list suites and benchmarks

@wesm

wesm commented May 9, 2019

Copy link
Copy Markdown
Member

I'm not sure why the "Regression" designation is necessary. It seems like we would want to monitor the performance of all of our benchmarks for significant changes.

@fsaintjacques

Copy link
Copy Markdown
Contributor Author

Multiple reasons:

  • some benchmarks are flaky.
  • some benchmarks exists only for point of reference (anything with Naive/Dummy in the name) for other benchmarks or hardware discover-ability, e.g. the memory latency and bandwidth benchmarks.
  • some benchmarks are heavyweight

I think it's preferable to white list than include everything and possibly increase false positive regression alerts. It also lower the time to run benchmarks since archery is explicitly passing (by default) the --benchmark_filter="^Regression" options.

@fsaintjacques

Copy link
Copy Markdown
Contributor Author

By the way, I have no attachment to Regression it can be something else, but the function name (and hence the benchmark name) paired with a regex are the easiest way to label and filter.

@wesm

wesm commented May 9, 2019

Copy link
Copy Markdown
Member

If it's an issue with reporting on flaky benchmarks, we could create a "blacklist file" to instruct the reporter not to report regressions in known flakes.

@nealrichardson

Copy link
Copy Markdown
Member

Some thoughts, for what they're worth:

  • A benefit of blacklisting vs. whitelisting (whatever the mechanism for indicating it) is that the default is to include benchmarks in regression testing, and I think that's the right default--if you're adding a benchmark, most likely the purpose is to make sure performance doesn't go down in the future. The bias should be in favor of this, and make developers have to opt out by blacklisting.
  • I'm generally of the opinion that flaky tests have negative value--you don't notice them when they pass, and when they fail, you don't believe them--and should either be fixed immediately or deleted (or skipped, if you're into that). My instinct is that the same is true for flaky benchmarks.
  • Maybe heavy benchmarks belong in a different place altogether since we aren't going to run them routinely.

@fsaintjacques

Copy link
Copy Markdown
Contributor Author

Ok, I'll just remove the filtering by default. Should we strip the BM_ out of every benchmark names? I don't see the value in keeping this.

@wesm

wesm commented May 15, 2019

Copy link
Copy Markdown
Member

It's been several days and I'm still having a hard time swallowing the "Regression*" naming convention

@fsaintjacques fsaintjacques force-pushed the ARROW-5269-regression-candidates branch 2 times, most recently from fe050bb to 0592ecb Compare May 21, 2019 15:49
@fsaintjacques

Copy link
Copy Markdown
Contributor Author

@ursabot benchmark

@ursabot

ursabot commented May 21, 2019

Copy link
Copy Markdown

I've successfully started builds for this PR

@ursabot

ursabot commented May 21, 2019

Copy link
Copy Markdown

AMD64 Ubuntu 18.04 C++ Benchmark

Build failed.

@codecov-io

codecov-io commented May 21, 2019

Copy link
Copy Markdown

Codecov Report

Merging #4285 into master will increase coverage by 0.98%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4285      +/-   ##
==========================================
+ Coverage    88.3%   89.28%   +0.98%     
==========================================
  Files         780      636     -144     
  Lines       98400    87138   -11262     
  Branches     1251        0    -1251     
==========================================
- Hits        86891    77801    -9090     
+ Misses      11273     9337    -1936     
+ Partials      236        0     -236
Impacted Files Coverage Δ
cpp/src/plasma/thirdparty/ae/ae.c 70.75% <0%> (-0.95%) ⬇️
cpp/src/plasma/test/external_store_tests.cc 100% <0%> (ø) ⬆️
cpp/src/plasma/test/client_tests.cc 100% <0%> (ø) ⬆️
cpp/src/plasma/test/serialization_tests.cc 100% <0%> (ø) ⬆️
go/arrow/ipc/writer.go
go/arrow/math/uint64_amd64.go
go/arrow/memory/memory_avx2_amd64.go
go/arrow/ipc/file_reader.go
js/src/enum.ts
go/arrow/array/builder.go
... and 144 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68329e7...27633c9. Read the comment docs.

@fsaintjacques

Copy link
Copy Markdown
Contributor Author

@ursabot benchmark

@fsaintjacques

fsaintjacques commented May 22, 2019

Copy link
Copy Markdown
Contributor Author

@pitrou note that I added the cmake configuration flag ARROW_BUILD_BENCHMARKS_REFERENCE which toggles a C/C++ define ARROW_WITH_BENCHMARKS_REFERENCE. The goal is to avoid compiling (and running) benchmarks which are not used for monitoring purposes but only point of reference, e.g. memory latency/bandwidth, naive implementations of algorithms...

Thus if you add a new benchmark and you know that it's only used for a point of reference of the true benchmark, you should probably wrap it with said #ifdef.

@ursabot

ursabot commented May 22, 2019

Copy link
Copy Markdown

I've successfully started builds for this PR

@ursabot

ursabot commented May 22, 2019

Copy link
Copy Markdown

AMD64 Ubuntu 18.04 C++ Benchmark

Build failed with an exception.

Comment thread cpp/src/arrow/builder-benchmark.cc Outdated
Comment thread cpp/src/arrow/builder-benchmark.cc Outdated
Comment thread cpp/src/arrow/builder-benchmark.cc Outdated
Comment thread dev/archery/archery/benchmark/runner.py Outdated
Comment thread dev/archery/archery/cli.py Outdated
Comment thread cpp/cmake_modules/DefineOptions.cmake Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just a nit, but ARROW_WITH_REFERENCE_BENCHMARKS might be a better choice.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there's ARROW_BUILD_BENCHMARKS already. The convention seems to be ARROW_BUILD_xxx to enable the xxx target, and ARROW_WITH_yyy to enable support for the yyy external library.

@fsaintjacques fsaintjacques May 27, 2019

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting changing the s/BUILD/WITH/ or the swap of BENCHMARKS/REFERENCE?

@kszucs kszucs left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple nits

@kszucs

kszucs commented May 24, 2019

Copy link
Copy Markdown
Member

@ursabot benchmark

@ursabot

ursabot commented May 24, 2019

Copy link
Copy Markdown

I've successfully started builds for this PR

@kszucs

kszucs commented May 24, 2019

Copy link
Copy Markdown
Member

Why did We get empty output in https://ci.ursalabs.org/#builders/73/builds/20 ?

@fsaintjacques

fsaintjacques commented May 24, 2019

Copy link
Copy Markdown
Contributor Author

@kszucs because benchmarks were renamed and thus the intersection between 2 run is empty. See https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark/compare.py#L115-L116

@ursabot

ursabot commented May 24, 2019

Copy link
Copy Markdown

AMD64 Ubuntu 18.04 C++ Benchmark

Build failed with an exception.

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for doing this. Here are some comments and questions.

Comment thread dev/archery/archery/cli.py Outdated
Comment thread dev/archery/archery/benchmark/google.py Outdated
Comment thread cpp/src/arrow/json/parser-benchmark.cc Outdated
Comment thread cpp/src/arrow/util/bit-util-benchmark.cc Outdated
Comment thread cpp/src/arrow/util/compression-benchmark.cc Outdated
Comment thread cpp/src/arrow/util/thread-pool-benchmark.cc Outdated
Comment thread cpp/src/arrow/util/thread-pool-benchmark.cc Outdated
Comment thread cpp/src/arrow/util/trie-benchmark.cc Outdated
Comment thread cpp/src/arrow/util/thread-pool-benchmark.cc Outdated
- Ensure that Builder benchmarks are working on inputs of the same
  (approximately) size in bytes. This allows relative comparison between
  builders.
- Renamed benchmarks by prefixing `Regression`.
- Fixed extra string copy in BuildStringDictionary.
- Use the same buffer size in all bitmap benchmarks.
- Fix some reporting numbers
- Add '\n' do diff json output adhering to jsonlines
- Add support for items_per_second metrics
- Add `--pdb` option to drop a pdb shell on uncaught exception
- Use a single data generator
- Fix multithread input size
- Exclude multithread from regressions
- Favor external repetitions over manual repetitions and mintime when
possible.
- Add cmake ARROW_BUILD_BENCHMARKS_REFERENCE to toggle reference benchmarks.
- Remove default benchmark filter of `^Regression`.
- Remove Regression prefix from benchmark
@fsaintjacques fsaintjacques force-pushed the ARROW-5269-regression-candidates branch from b8c4c41 to a9ecffd Compare May 28, 2019 15:27
@fsaintjacques

Copy link
Copy Markdown
Contributor Author

@pitrou @kszucs I addressed the comments.

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fsaintjacques . just a couple more things and it'll be good IMO :-)

Comment thread cpp/src/arrow/csv/parser-benchmark.cc Outdated
Comment thread cpp/src/arrow/csv/parser-benchmark.cc Outdated
Comment thread cpp/src/arrow/builder-benchmark.cc
Comment thread cpp/src/arrow/csv/parser-benchmark.cc Outdated

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants