Sorting Large Data Sets with FPGA-Accelerated Samplesort

Han Chen,Sergey Madaminov,Michael Ferdman,Peter Milder

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)（2019）

引用 10|浏览20

暂无评分

摘要

Sorting is a fundamental operation in many applications such as databases, search, and social networks. Although FPGAs have been shown effective at sorting data sizes that fit on chip, systems that sort larger data sets by shuffling data on and off chip are typically bottlenecked by costly merge operations or data transfer time. We propose a new approach to sorting large data sets by accelerating the samplesort algorithm using a server with a PCIe-connected FPGA. Samplesort works by randomly sampling to determine how to partition data into approximately equal-sized non-overlapping "buckets," sorting each bucket, and concatenating the results. Although samplesort can partition a large problem into smaller ones that fit in the FPGA's on-chip memory, partitioning in software is slow. Our system uses a novel parallel hardware partitioner that is only limited in data set size by available FPGA hardware resources. After partitioning, each bucket is sorted using parallel sorting hardware. The CPU is responsible for sampling data, cleaning up any potential problems caused by variation in bucket size, and providing scalability by performing an initial coarse-grained partitioning when the input set is larger than the FPGA can sort. We prototype our design using Amazon Web Services FPGA instances, which pair a Xilinx Virtex UltraScale+ FPGA with a high-performance server. Our experiments demonstrate a 17.1x speedup over GNU parallel sort when sorting 2^23 key-value records and a speedup of 4.2x when sorting 2^30 records.

查看译文

关键词

Field programmable gate arrays,Sorting,Hardware,Merging,Software,Prototypes,Acceleration

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要