Add Benchmark::ThreadRange() version with increment instead of multiply by hydroo · Pull Request #283 · google/benchmark

hydroo · 2016-08-31T09:17:36Z

Hi,

referring to #282.

for my work I want to use a range of threads.
So far the incrementing strategy seems to be limited to just multiplying with a factor.
I.e. you can only use thread counts like 1, 2, 4, 8, 16 or 1, 3, 9, 27.

So I made this small patch.

Should I create a pull request?

If you have an idea and a patch you should always make a PR. Issues are for people without patches :-P

for my work I want to use a range of threads

You must have a lot of threads... I'm initially skeptical of the need for this feature. Typically you don't want to exceed the number of threads supported by the hardware (at least by much). Please try and add a motivating example when you open the PR.

I'm closing this issue to make way for the PR.

Yes we do have up to 24 or 48 threads per board at the moment.

I use it for benchmarking synchronization algorithms. (Not a very common thing to do,)
I will probably need a few more things in the future like thread pinning and more accurate version of PauseTiming so I can make sure they leave at an exact time.
I'm not even sure google/benchmark is the right tool for the job. But it seems nice. The output, functionality and code is pretty simple and clear.

Motivating example:

template<typename barrier>
static void BM_barrier(benchmark::State& state) {

    while (state.KeepRunning()) {

        static barrier b;
        if (state.thread_index == 0) { b.init(state.threads); }

        state.PauseTiming();

        auto start = std::chrono::high_resolution_clock::now();
        b.wait(state.thread_index);
        auto end   = std::chrono::high_resolution_clock::now();

        auto elapsed_seconds = std::chrono::duration_cast<std::chrono::duration<double>>(end - start);

        if (elapsed_seconds < std::chrono::nanoseconds(0)) {
            state.SkipWithError("elapsed seconds < 0");
            break;
        }

        state.SetIterationTime(elapsed_seconds.count());       

        if (state.thread_index == 0) { b.destroy(); }
    }                                                                                                                                                                       
}

BENCHMARK_TEMPLATE(BM_barrier, baseline_barrier)->ThreadRange2(1, 4)->UseManualTime();
BENCHMARK_TEMPLATE(BM_barrier, central_counter_barrier)->ThreadRange2(1, 4)->UseManualTime();
BENCHMARK_TEMPLATE(BM_barrier, sense_reversing_centralized_barrier)->ThreadRange2(1, 4)->UseManualTime();

BENCHMARK_MAIN();

googlebot · 2016-08-31T09:17:40Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please let us know the company's name.

coveralls · 2016-08-31T09:22:59Z

Coverage decreased (-0.8%) to 86.906% when pulling f9e71dd on hydroo:master into 61f570e on google:master.

AppVeyorBot · 2016-08-31T09:31:19Z

✅ Build benchmark 401 completed (commit b457d0c0eb by @hydroo)

hydroo · 2016-08-31T13:51:17Z

I signed it!

googlebot · 2016-08-31T13:51:22Z

CLAs look good, thanks!

EricWF · 2016-09-02T09:38:04Z

I will probably need a few more things in the future like thread pinning and more accurate version of PauseTiming so I can make sure they leave at an exact time.

FYI PR #286 makes PauseTiming() per thread instead of per-process so it doesn't perform synchronization and block the threads.

hydroo · 2016-09-02T17:06:07Z

Thanks for the info.

EricWF · 2016-09-03T06:22:45Z

  //    Foo in 16 threads
  Benchmark* ThreadRange(int min_threads, int max_threads);

+  Benchmark* ThreadRange2(int min_threads, int max_threads);


I would suggest DenseThreadRange(int min_threads, int max_threads, int stride=1) since it matches the DenseRange function we already have.

EricWF · 2016-09-03T06:36:16Z

Alright so there are a couple things left to do.

The DenseThreadRange(...) declaration in benchmark_api.h needs to be documented like the other functions around it. Take a look at DenseRange for an example.
This needs a test. Somethink like this should work:

static void BM_ThreadRanges(benchmark::State& st) {
  switch (st.range(0)) {
  case 1:
    assert(st.threads == 1 || st.threads == 2 || st.threads == 3);
    break;
  case 2:
    assert(st.threads == 1 || st.threads == 3);
    break;
  case 3:
    assert(st.threads == 5 || st.threads == 8 || st.threads == 11 || st.threads == 14);
    break;
  default:
    assert(false && "Invalid test case number");
  }
  while (st.KeepRunning()) {}
}
BENCHMARK(BM_ThreadRanges)->Arg(1)->DenseThreadRange(1, 3);
BENCHMARK(BM_ThreadRanges)->Arg(2)->DenseThreadRange(1, 3, 2);
BENCHMARK(BM_ThreadRanges)->Arg(3)->DenseThreadRange(5, 14, 3);

hydroo · 2016-09-03T09:06:35Z

Is there a clang-format file for this project? I just tried clang-format -style=Google on benchmark_api.h and it changed a lot lot. So that's probably not what you do.

EricWF · 2016-09-03T09:09:55Z

There is a config file now. It was just checked in today.
To format patches use clang-format-diff

hydroo · 2016-09-03T09:22:05Z

When I make test, benchark_test.cc is not executed. Is there a non-manual way to run it? Or how is this intended?

I add the test to the end of benchark_test.cc, ok?

hydroo · 2016-09-03T09:34:13Z

This clang format file seems to be for a clang version newer than 3.8 (Newest Ubuntu).
Error: error: unknown key 'IncludeIsMainRegex'

EricWF · 2016-09-03T09:35:47Z

Stick the test wherever makes sense and use --style=Google if the config doesn't work.

hydroo · 2016-09-03T10:00:23Z

I force pushed my master, since I had troubles with git. I used this opportunity to clean up my commits as well. It's now only two.

coveralls · 2016-09-03T10:03:12Z

Coverage increased (+0.09%) to 87.219% when pulling 86ebccf on hydroo:master into 60e88c2 on google:master.

AppVeyorBot · 2016-09-03T10:17:37Z

❌ Build benchmark 442 failed (commit b9ba8c6b88 by @hydroo)

EricWF · 2016-09-03T19:56:13Z

Looks good to me! Thanks for the patch.

EricWF · 2016-09-03T20:40:29Z

To discuss some of your other comments:

more accurate version of PauseTiming so I can make sure they leave at an exact time.

PauseTiming now act's a lot differently. In particular it's thread-local and doesn't act as a barrier so hopefully this solves your problem. Also SetIterationTime is also per-thread now. The final value is the average of each of the threads values.

I'm looking into "thread pinning" as a feature, but I'm not sure what the correct way to pin the threads is.
Could you clarify what you need here?

hydroo · 2016-09-05T07:15:24Z

Thanks!

Last Saturday I had big troubles getting my code to work in the benchmark harness, again.
I liked having the barrier there.
I am not sure what was going on.
It would help, if I could acquire shared resources before spawning any threads and pass them.
If all threads start the benchmark at the same time, I have to make sure everyone has the correct data.
Starting the benchmarks via my own main seems to not help with that, since I can only start all benchmarks at the same time, and not individually (I can only acquire all resources before running any benchmark, and release them afterwards).

Can threads overtake each other in KeepRunning() or is there a barrier between iterations?

As for SetIterationTime, I would also like to have the maximum of all threads and the time from the earliest before to the latest after (kind of time maximum time collectively spent in the code).

Since I benchmark synchronization primitives I want to make sure (1) threads are not migrated during benchmark execution, (2) threads are placed in a sensible way (not 1, 3 on 2 cpus with 2 threads each, e.g.), perhaps I want to even place them individually myself because distances between CPUs affect performance.

Last time I used pthread_getaffinity_np, pthread_setaffinity_np. Haven't looked into it yet. Maybe there is a newer, nicer way. Also sched_setaffinity exists on linux.

EricWF · 2016-09-06T20:48:39Z

Last Saturday I had big troubles getting my code to work in the benchmark harness, again.
I liked having the barrier there.

Sorry but the barrier in PauseTiming() essentially made multithreaded benchmarks synchronous , which is obviously not what we want. Removing the barrier drastically improved the timing accuracy of multithreaded benchmarks.

I plan to add barrier functionality to State so that users can block when need be, but I don't think PauseTiming() should be it.

I am not sure what was going on.
It would help, if I could acquire shared resources before spawning any threads and pass them.
If all threads start the benchmark at the same time, I have to make sure everyone has the correct data.
Starting the benchmarks via my own main seems to not help with that, since I can only start all benchmarks at the same time, and not individually (I can only acquire all resources before running any benchmark, and release them afterwards).

That shouldn't be a problem with the existing API. See the example below.

Can threads overtake each other in KeepRunning() or is there a barrier between iterations?

KeepRunning() only acts as a barrier on the first and last iteration. This ensures that the initialization code has been run before any thread enters the loop and that the teardown code only runs after all threads are done. You should be able to setup/teardown shared state between multiple threads like this:

void SetupGlobalState();
void TeardownGlobalState();
std::vector<int> global_state;
static BM_Foo(benchmark::State& st) {
  if (st.thread_index == 0) { // A single thread executes this setup.
    SetupGlobalState();
  }
  PinCurrentThread(); // All thread execute this
  while (st.KeepRunning()) { // No thread starts the loop until all threads are here.
     /* ... */
  } // No thread passes here until all threads are complete.
  if (st.thread_index == 0) {
    TeardownGlobalState();
  }
}

As for SetIterationTime, I would also like to have the maximum of all threads and the time from the earliest before to the latest after (kind of time maximum time collectively spent in the code).

Yeah I think SetIterationTime() could offer a bunch more functionality. I'm looking into better support for custom timers.

Since I benchmark synchronization primitives I want to make sure (1) threads are not migrated during benchmark execution, (2) threads are placed in a sensible way (not 1, 3 on 2 cpus with 2 threads each, e.g.), perhaps I want to even place them individually myself because distances between CPUs affect performance.

I think users will generally want to place the threads themselves. That's why IDK if this can be a library feature. Also it's super easy for users to pin threads today (see above).

Last time I used pthread_getaffinity_np, pthread_setaffinity_np. Haven't looked into it yet. Maybe there is a newer, nicer way. Also sched_setaffinity exists on linux.

No that's the way to do it. But as mentioned above it's not clear what to set the affinity to.

googlebot added the cla: no label Aug 31, 2016

googlebot added cla: yes and removed cla: no labels Aug 31, 2016

EricWF reviewed Sep 3, 2016
View reviewed changes

hydroo added 2 commits September 3, 2016 11:56

add additive thread range

67db158

add test

86ebccf

EricWF merged commit 72be952 into google:master Sep 3, 2016

Uh oh!

Conversation

hydroo commented Aug 31, 2016

Uh oh!

googlebot commented Aug 31, 2016

Uh oh!

coveralls commented Aug 31, 2016

Uh oh!

AppVeyorBot commented Aug 31, 2016

Uh oh!

hydroo commented Aug 31, 2016

Uh oh!

googlebot commented Aug 31, 2016

Uh oh!

EricWF commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hydroo commented Sep 2, 2016

Uh oh!

EricWF Sep 3, 2016

Choose a reason for hiding this comment

Uh oh!

EricWF commented Sep 3, 2016

Uh oh!

hydroo commented Sep 3, 2016

Uh oh!

EricWF commented Sep 3, 2016

Uh oh!

hydroo commented Sep 3, 2016

Uh oh!

hydroo commented Sep 3, 2016

Uh oh!

EricWF commented Sep 3, 2016

Uh oh!

hydroo commented Sep 3, 2016

Uh oh!

coveralls commented Sep 3, 2016

Uh oh!

AppVeyorBot commented Sep 3, 2016

Uh oh!

EricWF commented Sep 3, 2016

Uh oh!

EricWF commented Sep 3, 2016

Uh oh!

hydroo commented Sep 5, 2016

Uh oh!

EricWF commented Sep 6, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

EricWF commented Sep 2, 2016 •

edited

Loading