Skip to content

jonas-peeters/Pattern-exploiting_Sorting_Algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Improving Sorting Algorithm Performance by Optimising Branch Prediction

This repository contains the benchmarking code for the papers "Improving Quicksort Performance by Optimizing Branch Prediction" and "Improving Mergesort Performance by Optimizing Branch Prediction".

The implementation of the algorithms "Pattern-exploiting quicksort" (PEQS) and "Pattern-exploiting mergesort" (PEMS) can be found in the files quick_peqs.h and merge_pems.h++ respectively.

Publications

This repository contains code for the papers:

Improving Quicksort Performance by Optimizing Branch Prediction

Abstract

By detecting arrays with a low degree of presortedness, an adaptive quicksort algorithm can switch to a branch free implementation in order to reduce the runtime by up to 2.5 times compared to traditional implementations and outperform strictly branch free algorithms on arrays with a high degree of presortedness. By using a fast method of detecting the degree of presortedness, no prior knowledge about the array composition is required for the algorithm.

Citation

@INPROCEEDINGS{10089318,
  author={Peeters, Jonas and Haase, Jan},
  booktitle={2022 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)}, 
  title={Improving Quicksort Performance by Optimizing Branch Prediction}, 
  year={2022},
  pages={1-6},
  keywords={Computer science;Runtime;Measurement uncertainty;Adaptive arrays;Switches;Prediction algorithms;Data engineering;in-place sorting;quicksort;branch prediction;lean programs},
  doi={10.1109/CSDE56538.2022.10089318}}

Improving Mergesort Performance by Optimizing Branch Prediction

Abstract

By detecting arrays with a low degree of presortedness, an adaptive mergesort algorithm can switch to a branch free implementation in order to reduce the runtime by up to 30% compared to traditional implementations and almost 50% compared to the mergesort based Timsort, while retaining the great performance of branch based variants on partially presorted arrays. By using a fast method of detecting the degree of presortedness, no prior knowledge about the array composition is required for the algorithm.

Citation

@INPROCEEDINGS{10089293,
  author={Peeters, Jonas and Haase, Jan},
  booktitle={2022 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)}, 
  title={Improving Mergesort Performance by Optimizing Branch Prediction}, 
  year={2022},
  pages={1-6},
  keywords={Computer science;Runtime;Measurement uncertainty;Adaptive arrays;Switches;Prediction algorithms;Data engineering;sorting;mergesort;branch prediction;lean programs},
  doi={10.1109/CSDE56538.2022.10089293}
}

Benchmarking

  • Main benchmark program is in benchy.c++
  • output-clang-c++-full.csv and output-gcc-c++-full.csv contain the benchmark results for all algorithms compiled using GCC and Clang

To compile benchy.c++, PAPI is required, which can be found here.

With the authors current configuration, the following commands compile the program successfully:

$ clang++-12 -O3 -mtune=native -march=native -I/usr/include/x86_64-linux-gnu/c++/10/bits -lm -L/usr/local/lib -lpapi -std=c++17 benchy.c++ -o benchy
$ g++-10 -O3 -mtune=native -march=native -I/usr/include/x86_64-linux-gnu/c++/10/bits -lm -L/usr/local/lib -lpapi -std=c++17 benchy.c++ -o benchy

The benchmark program has to be started with admin rights. These are required for access to the CPU performance counters

The program can be supplied with up to four arguments:

  1. Algorithm name: Is substring matched to the list in benchy.c++::122-137)
  2. "Before sort action": Action that will be performed to the input array before sorting. 0. 0 = NOTHING
    1. 1 = PRESORTED
    2. 2 = SORT_REVERSE
    3. 3 = END_RANDOM
    4. 4 = LIMIT_VALUES_1000
    5. 5 = LIMIT_VALUES_10000
  3. Array size to test
  4. Seed for rand(), otherwise 1-5 are used

If the appication stops with an error from PAPI complaining about too many different CPU counters, try removing some from the list at benchy.c++::155-159, as your CPU is capable of fewer than are in there.

The application will output the average number of CPU cycles for the biggest array tested for each algorithm-test-combo. The full results are written to the files output-clang-c++.csv and output-gcc-c++.csv depending on the compiler used.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors