High-Performance Parallel Radix Sort on FPGA

2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2020)

引用 3|浏览73
暂无评分
摘要
Sorting is a key part in database operators (like duplicate elimination, sort-merge joins and group-by aggregations). Sorting billions of records in a fast and energy efficient manner has become a key research challenge. In this work, we explore sorting in-memory using a parallel version of Radix Sort to build a high-performance hardware accelerator, called HARS (Hardware Accelerated Radix Sort). Our design enables dividing the unsorted dataset among parallel engines without the need for a merge step. HARS is implemented on Micron’s SB-852 FPGA board. The proposed accelerator provides high throughput in-memory sorting at a rate of 44 Million 128-bit records per second. HARS is 1.4x faster than CPU and 1.36x faster than GPU when GPU bandwidth is normalized. Projected performance of a proposed board with a more capable FPGA chip would yield 1.25x higher throughput.
更多
查看译文
关键词
high-performance hardware accelerator,HARS,parallel engines,high throughput in-memory sorting,FPGA chip,database operators,duplicate elimination,high-performance parallel radix sort,Micron SB-852 FPGA board,hardware accelerated radix sort
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要