thSORT: an efficient parallel sorting algorithm on multi-core DSPs

CCF Transactions on High Performance Computing(2024)

引用 0|浏览7
暂无评分
摘要
Multi-core architecture has become the main trend in high performance computing (HPC) because of its powerful parallel computing capability. Due to energy efficiency constraints, energy-efficient multi-core digital signal processors (DSPs) have become an alternative architecture in HPC systems. FT-M7032 is a CPU-DSP heterogeneous processor that integrates 16 CPU cores for running operating systems and four multi-core general purpose DSP (GPDSP) clusters for providing high performance. Sorting is a fundamental operation in computer science with numerous applications and has been studied extensively, but high-performance parallel sorting algorithms are typically architecture-specific. To our knowledge, little attention has been paid to optimizing the sorting on the low-power multicore DSPs. In this paper, we propose thSORT, an efficient bitonic sorting algorithm for FT-M7032. Our algorithm consists of two parts: single-core DSP sorting and multi-core DSP sorting, both aiming to tap the features of FT-M7032. We implement a vector micro-kernel for bitonic sort and propose a multi-level algorithm to merge the results of the micro-kernel. When compared to the CPU baseline, our implementation is 1.43 × faster than the parallel sorting of the Boost C++ Libraries, and is 2.15 × faster than std::sort.
更多
查看译文
关键词
Bitonic sorting network,Multi-core DSPs,Parallel sorting
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要