This repository contains the benchmarking code for the papers "Improving Quicksort Performance by Optimizing Branch Prediction" and "Improving Mergesort Performance by Optimizing Branch Prediction".
The implementation of the algorithms "Pattern-exploiting quicksort" (PEQS) and "Pattern-exploiting mergesort" (PEMS) can be found in the files quick_peqs.h and merge_pems.h++ respectively.
This repository contains code for the papers:
- Improving Quicksort Performance by Optimizing Branch Prediction
- Improving Mergesort Performance by Optimizing Branch Prediction
By detecting arrays with a low degree of presortedness, an adaptive quicksort algorithm can switch to a branch free implementation in order to reduce the runtime by up to 2.5 times compared to traditional implementations and outperform strictly branch free algorithms on arrays with a high degree of presortedness. By using a fast method of detecting the degree of presortedness, no prior knowledge about the array composition is required for the algorithm.
@INPROCEEDINGS{10089318,
author={Peeters, Jonas and Haase, Jan},
booktitle={2022 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)},
title={Improving Quicksort Performance by Optimizing Branch Prediction},
year={2022},
pages={1-6},
keywords={Computer science;Runtime;Measurement uncertainty;Adaptive arrays;Switches;Prediction algorithms;Data engineering;in-place sorting;quicksort;branch prediction;lean programs},
doi={10.1109/CSDE56538.2022.10089318}}By detecting arrays with a low degree of presortedness, an adaptive mergesort algorithm can switch to a branch free implementation in order to reduce the runtime by up to 30% compared to traditional implementations and almost 50% compared to the mergesort based Timsort, while retaining the great performance of branch based variants on partially presorted arrays. By using a fast method of detecting the degree of presortedness, no prior knowledge about the array composition is required for the algorithm.
@INPROCEEDINGS{10089293,
author={Peeters, Jonas and Haase, Jan},
booktitle={2022 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)},
title={Improving Mergesort Performance by Optimizing Branch Prediction},
year={2022},
pages={1-6},
keywords={Computer science;Runtime;Measurement uncertainty;Adaptive arrays;Switches;Prediction algorithms;Data engineering;sorting;mergesort;branch prediction;lean programs},
doi={10.1109/CSDE56538.2022.10089293}
}- Main benchmark program is in
benchy.c++ output-clang-c++-full.csvandoutput-gcc-c++-full.csvcontain the benchmark results for all algorithms compiled using GCC and Clang
To compile benchy.c++, PAPI is required, which can be found here.
With the authors current configuration, the following commands compile the program successfully:
$ clang++-12 -O3 -mtune=native -march=native -I/usr/include/x86_64-linux-gnu/c++/10/bits -lm -L/usr/local/lib -lpapi -std=c++17 benchy.c++ -o benchy
$ g++-10 -O3 -mtune=native -march=native -I/usr/include/x86_64-linux-gnu/c++/10/bits -lm -L/usr/local/lib -lpapi -std=c++17 benchy.c++ -o benchyThe benchmark program has to be started with admin rights. These are required for access to the CPU performance counters
The program can be supplied with up to four arguments:
- Algorithm name: Is substring matched to the list in benchy.c++::122-137)
- "Before sort action": Action that will be performed to the input array before sorting.
0. 0 = NOTHING
- 1 = PRESORTED
- 2 = SORT_REVERSE
- 3 = END_RANDOM
- 4 = LIMIT_VALUES_1000
- 5 = LIMIT_VALUES_10000
- Array size to test
- Seed for rand(), otherwise 1-5 are used
If the appication stops with an error from PAPI complaining about too many different CPU counters, try removing some from the list at benchy.c++::155-159, as your CPU is capable of fewer than are in there.
The application will output the average number of CPU cycles for the biggest array tested for each algorithm-test-combo.
The full results are written to the files output-clang-c++.csv and output-gcc-c++.csv depending on the compiler
used.