Describe the bug
Hello,
I have the following bechmark:
#include <benchmark/benchmark.h>
#include <gtest/gtest.h>
/**
* defines / pre-allocates a std::vector<std::wstring> examples (within g_dataset)
*/
#include "Helpers.h"
#include <chrono>
#include <cstdint>
#include <ranges>
#include <vector>
#include <execution>
#include <codecvt>
std::string DoWork(const std::wstring& input)
{
std::string result(input.size(), '\0');
std::transform(std::execution::par_unseq, std::begin(input), std::end(input), std::begin(result),
[](wchar_t inChar) { return (char)(inChar & 0x0ff); });
return result;
}
std::wstring DoWork1(const std::string& input)
{
std::wstring result(input.size(), L'\0');
std::transform(std::execution::par_unseq, std::begin(input), std::end(input), std::begin(result),
[](wchar_t inChar) { return (wchar_t)(inChar & 0x0ff); });
return result;
}
template <typename InStringT,
typename Fn,
typename OutStringT = std::invoke_result_t<Fn,InStringT>>
void RunOnExamples(const std::vector<InStringT>& examples, Fn&& fn,
std::vector<OutStringT>& results)
{
for (size_t iterations = 0; iterations < examples.size(); iterations++) {
results[iterations] = fn(examples[iterations]);
}
}
static void BM_DoWork(benchmark::State& state)
{
state.PauseTiming();
auto expectedResults = std::vector<std::string>(g_dataset.examples.size());
state.ResumeTiming();
while (state.KeepRunningBatch(g_dataset.examples.size())) {
RunOnExamples(g_dataset.examples, DoWork, expectedResults);
}
state.SetItemsProcessed(state.iterations() * g_dataset.examples.size());
}
static void BM_DoWork1(benchmark::State& state)
{
state.PauseTiming();
auto expectedResults = std::vector<std::wstring>(g_dataset.examples.size());
std::vector<std::string> examples(g_dataset.examples.size());
static std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> myconv;
for (size_t i = 0; i < g_dataset.examples.size(); i++) {
examples[i] = myconv.to_bytes(g_dataset.examples[i]);
}
state.ResumeTiming();
while (state.KeepRunningBatch(g_dataset.examples.size())) {
RunOnExamples(examples, DoWork1, expectedResults);
}
state.SetItemsProcessed(state.iterations() * g_dataset.examples.size());
}
#pragma endregion
#pragma region Run benchmarks
BENCHMARK(BM_DoWork);
BENCHMARK(BM_DoWork);
BENCHMARK(BM_DoWork);
BENCHMARK(BM_DoWork1);
BENCHMARK(BM_DoWork1);
BENCHMARK(BM_DoWork1);
#pragma endregion
The output looks like this - the items_per_second decreases in rather big jumps:
Run on (20 X 2918 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x10)
L1 Instruction 32 KiB (x10)
L2 Unified 1280 KiB (x10)
L3 Unified 24576 KiB (x1)
------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------
BM_DoWork 86347685 ns 6438 ns 1000000 items_per_second=155.34G/s
BM_DoWork 86352813 ns 10688 ns 1000000 items_per_second=93.5673G/s
BM_DoWork 86358084 ns 15141 ns 1000000 items_per_second=66.0475G/s
BM_DoWork1 86363182 ns 19484 ns 1000000 items_per_second=51.3232G/s
BM_DoWork1 86369707 ns 25062 ns 1000000 items_per_second=39.9002G/s
BM_DoWork1 86378334 ns 32438 ns 1000000 items_per_second=30.8285G/s
If I reorder the benchmarks, the same pattern is observed.
There has been a similar question on StackOverlow a couple of years ago, but it remained unanswered.
System
Which OS, compiler, and compiler version are you using:
- OS: Microsoft Windows 11
- Compiler and version: MSVC 17.14.15
Expected behavior
Registering benchmarks in different order shouldn't affect the measured performance - a deviation of "a couple" percent is expected, but nothing like the steady decreasing in huge jumps.
Describe the bug
Hello,
I have the following bechmark:
The output looks like this - the
items_per_seconddecreases in rather big jumps:If I reorder the benchmarks, the same pattern is observed.
There has been a similar question on StackOverlow a couple of years ago, but it remained unanswered.
System
Which OS, compiler, and compiler version are you using:
Expected behavior
Registering benchmarks in different order shouldn't affect the measured performance - a deviation of "a couple" percent is expected, but nothing like the steady decreasing in huge jumps.