Vectorized and performance-portable Quicksort

Jan Wassenberg; Joachim Giesen; Mark Blacher; Peter Sanders

Vectorized and performance-portable Quicksort

Jan Wassenberg

Joachim Giesen

Mark Blacher

Peter Sanders

Software: Practice and Experience (2022) (to appear)

Google Scholar

Abstract

Recent works showed that implementations of Quicksort using
vector CPU instructions can outperform the non-vectorized algorithms in
widespread use. However, these implementations are typically single-threaded,
implemented for a particular instruction set, and restricted to a small set of
key types. We lift these three restrictions: our proposed vqsort
algorithm integrates into the state-of-the-art parallel sorter ips4o,
speeding it up by a factor of 1.5 to 1.8. The same implementation works on seven
instruction sets (including SVE and RISC-V V) across four platforms. It also
supports floating-point and 16-128 bit integer keys. To the best of our
knowledge, this is the fastest sort for non-tuple keys on CPUs, up to 20 times
as fast as the sorting algorithms implemented in standard libraries. This paper
focuses on the practical engineering aspects enabling the speed and portability,
which we have not yet seen demonstrated for a Quicksort implementation.
Furthermore, we introduce compact and transpose-free sorting networks for
in-register sorting of small arrays, and a vector-friendly pivot sampling
strategy that is robust against adversarial input.

Research Areas

Algorithms and theory

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Vectorized and performance-portable Quicksort

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs