Origami: A High-Performance Mergesort Framework

PROCEEDINGS OF THE VLDB ENDOWMENT(2021)

引用 1|浏览0
暂无评分
摘要
Mergesort is a popular algorithm for sorting real-world workloads as it is immune to data skewness, suitable for parallelization using vectorized intrinsics, and relatively simple to multithread. In this paper, we introduce Origami, an in-memory mergesort framework that is optimized for scalar, as well as all current SIMD (single-instructionmultiple-data) CPU architectures. For each vector-extension set (e.g., SSE, AVX2, AVX-512), we present an inregister sorter for small sequences that is up to 8× faster than prior methods and a branchless streaming merger that achieves up to a 1.5× speed-up over the naive merge. In addition, we introduce a cache-residing quad-merge tree to avoid bottlenecking on memory bandwidth and a parallel partitioning scheme to maximize threadlevel concurrency.We develop an end-to-end sort with these components and produce a highly utilized mergesort pipeline by reducing the synchronization overhead between threads. Single-threaded Origami performs up to 2× faster than the closest competitor and achieves a nearly perfect speed-up in multi-core environments. PVLDB Reference Format: Arif Arman and Dmitri Loguinov. Origami: A High-Performance Mergesort Framework. PVLDB, 15(2): 259-271, 2022. doi:10.14778/3489496.3489507 PVLDB Artifact Availability: The source code, data, and/or other artifacts have been made available at http://irl.cs.tamu.edu/projects/streams.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要