谷歌浏览器插件
订阅小程序
在清言上使用

Fast Arbitrary Precision Floating Point on FPGA

2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022)(2022)

引用 1|浏览43
暂无评分
摘要
Numerical codes that require arbitrary precision floating point (APFP) numbers for their core computation are dominated by elementary arithmetic operations due to the superlinear complexity of multiplication in the number of mantissa bits. APFP computations on conventional software-based architectures are made exceedingly expensive by the lack of native hardware support, requiring elementary operations to be emulated using instructions operating on machine-word-sized blocks. In this work, we show how APFP multiplication on compile-time fixed-precision operands can be implemented as deep FPGA pipelines with a recursively defined Karatsuba decomposition on top of native DSP multiplication. When comparing our design implemented on an Alveo U250 accelerator to a dual-socket 36-core Xeon node running the GNU Multiple Precision Floating-Point Reliable (MPFR) library, we achieve a 9.8x speedup at 4.8 GOp/s for 512-bit multiplication, and a 5.3x speedup at 1.2 GOp/s for 1024-bit multiplication, corresponding to the throughput of more than 351x and 191x CPU cores, respectively. We apply this architecture to general matrix-matrix multiplication, yielding a 10x speedup at 2.0 GOp/s over the Xeon node, equivalent to more than 375x CPU cores, effectively allowing a single FPGA to replace a small CPU cluster. Due to the significant dependence of some numerical codes on APFP, such as semidefinite program solvers, we expect these gains to translate into real-world speedups. Our configurable and flexible HLS-based code provides as high-level software interface for plug-and-play acceleration, published as an open source project.
更多
查看译文
关键词
fast arbitrary Precision Floating Point,numerical codes,arbitrary precision floating point numbers,core computation,elementary arithmetic operations,mantissa bits,APFP computations,conventional software-based architectures,native hardware support,elementary operations,machine-word-sized blocks,APFP multiplication,compile-time fixed-precision operands,deep FPGA pipelines,recursively defined Karatsuba decomposition,native DSP multiplication,Alveo U250 accelerator,dual-socket 36-core Xeon node,GNU Multiple Precision Floating-Point Reliable,4.8 GOp,512-bit multiplication,1024-bit multiplication,CPU cores,general matrix-matrix multiplication,2.0 GOp,single FPGA,configurable HLS-based code,flexible HLS-based code
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要