Skip to content

[BUG] Benchmark registration order affects reported outcomes #2235

Description

@ccomendant

Describe the bug
Hello,
I have the following bechmark:

#include <benchmark/benchmark.h>
#include <gtest/gtest.h>

/**
 * defines / pre-allocates a std::vector<std::wstring> examples (within g_dataset)
 */
#include "Helpers.h"

#include <chrono>
#include <cstdint>
#include <ranges>
#include <vector>
#include <execution>
#include <codecvt>

std::string DoWork(const std::wstring& input)
{
    std::string result(input.size(), '\0');
    std::transform(std::execution::par_unseq, std::begin(input), std::end(input), std::begin(result),
                   [](wchar_t inChar) { return (char)(inChar & 0x0ff); });
    return result;
}

std::wstring DoWork1(const std::string& input)
{
    std::wstring result(input.size(), L'\0');
    std::transform(std::execution::par_unseq, std::begin(input), std::end(input), std::begin(result),
                   [](wchar_t inChar) { return (wchar_t)(inChar & 0x0ff); });
    return result;
}

template <typename InStringT,
    typename Fn,
    typename OutStringT = std::invoke_result_t<Fn,InStringT>>
void RunOnExamples(const std::vector<InStringT>& examples, Fn&& fn,
                  std::vector<OutStringT>& results)
{
    for (size_t iterations = 0; iterations < examples.size(); iterations++) {
        results[iterations] = fn(examples[iterations]);
    }
}

static void BM_DoWork(benchmark::State& state)
{
    state.PauseTiming();

    auto expectedResults = std::vector<std::string>(g_dataset.examples.size());

    state.ResumeTiming();

    while (state.KeepRunningBatch(g_dataset.examples.size())) {
        RunOnExamples(g_dataset.examples, DoWork, expectedResults);
    }

    state.SetItemsProcessed(state.iterations() * g_dataset.examples.size());
}

static void BM_DoWork1(benchmark::State& state)
{
    state.PauseTiming();

    auto expectedResults = std::vector<std::wstring>(g_dataset.examples.size());

    std::vector<std::string> examples(g_dataset.examples.size());
    static std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> myconv;
    for (size_t i = 0; i < g_dataset.examples.size(); i++) {
        examples[i] = myconv.to_bytes(g_dataset.examples[i]);
    }

    state.ResumeTiming();

    while (state.KeepRunningBatch(g_dataset.examples.size())) {
        RunOnExamples(examples, DoWork1, expectedResults);
    }

    state.SetItemsProcessed(state.iterations() * g_dataset.examples.size());
}

#pragma endregion

#pragma region Run benchmarks

BENCHMARK(BM_DoWork);
BENCHMARK(BM_DoWork);
BENCHMARK(BM_DoWork);
BENCHMARK(BM_DoWork1);
BENCHMARK(BM_DoWork1);
BENCHMARK(BM_DoWork1);

#pragma endregion

The output looks like this - the items_per_second decreases in rather big jumps:

Run on (20 X 2918 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 1280 KiB (x10)
  L3 Unified 24576 KiB (x1)
------------------------------------------------------------------------------
Benchmark                    Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------
BM_DoWork                   86347685 ns         6438 ns      1000000 items_per_second=155.34G/s
BM_DoWork                   86352813 ns        10688 ns      1000000 items_per_second=93.5673G/s
BM_DoWork                   86358084 ns        15141 ns      1000000 items_per_second=66.0475G/s
BM_DoWork1                  86363182 ns        19484 ns      1000000 items_per_second=51.3232G/s
BM_DoWork1                  86369707 ns        25062 ns      1000000 items_per_second=39.9002G/s
BM_DoWork1                  86378334 ns        32438 ns      1000000 items_per_second=30.8285G/s

If I reorder the benchmarks, the same pattern is observed.

There has been a similar question on StackOverlow a couple of years ago, but it remained unanswered.

System
Which OS, compiler, and compiler version are you using:

  • OS: Microsoft Windows 11
  • Compiler and version: MSVC 17.14.15

Expected behavior
Registering benchmarks in different order shouldn't affect the measured performance - a deviation of "a couple" percent is expected, but nothing like the steady decreasing in huge jumps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions