A Near-Memory Radix Sort Accelerator with Parallel 1-bit Sorter

2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2022)

引用 0|浏览3
暂无评分
摘要
Sorting is one of the most fundamental operations for many applications. For efficient sorting, data locality can be exploited by processing subdivided data in parallel. This work presents a high-performance and area-efficient near-memory radix sort accelerator where end-to-end sorting is performed locally. With a parallel 1-bit radix sorter, it achieves high throughput by processing multiple keys per cycle. Tested with Xilinx Zynq UltraScale+ ZCU104 FPGA, the experimental result shows up to 10x performance speedup over CPU. It is highly area-efficient and can be integrated into each processing node of a distributed computing system with low area cost.
更多
查看译文
关键词
parallel 1-bit sorter,fundamental operations,efficient sorting,data locality,subdivided data,area-efficient near-memory radix sort accelerator,end-to-end sorting,parallel 1-bit radix sorter,Xilinx Zynq UltraScale+ ZCU104 FPGA,10x performance speedup,highly area-efficient,CPU,distributed computing system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要