GRAPE-MPs: Implementation of an SIMD for Quadruple/Hexuple/Octuple-Precision Arithmetic Operation on a Structured ASIC and an FPGA

Naohito Nakasato,Hiroshi Daisaka,Toshiyuki Fukushige,Atsushi Kawai,Jun'ichiro Makino,Tadashi Ishikawa,Fukuko Yuasa

Embedded Multicore Socs（2012）

引用 5|浏览1

暂无评分

摘要

We describe the design and performance of the GRAPE-MPs, a series of SIMD accelerator boards for quadruple/hexuple/octuple-precision arithmetic operations. Basic design of GRAPE-MPs is that it consists of a number of processing elements (PE) and memory components which handle data with quadruple/hexuple/octuple-precision. A GRAPE-MPs processor is implemented on a structured ASIC chip and an FPGA chip. GRAPE-MP (quadruple-precision) uses a structured ASIC chip from eASIC corp., which has 6 PE and operates with 100MHz clock cycle. The theoretical peak quadruple-precision performance of the single board is 1.2 Gflops and the achieved performance for the Feynman loop integrals is about 0.5 Gflops. GRAPE-MP4/6/8 (quadruple/hexuple/octuple-precision) uses an FPGA chip from Aletra corporation. For example, in the current implementation, MP8 has 10 PE with 70MHz operation clock cycle. We also present the performance results with the multiple GRAPE-MPs boards. The achieved performance of four MP8 boards is about 1.6 Gflops. It is roughly 90 times faster than the performance of a single core of a CPU with comparable precision. We show that our hardware based approach to evaluate the Feynman loop integrals in high precision arithmetic operations is highly effective.

查看译文

关键词

parallel processing,computer speed 0.5 gflops,grape-mp processor,mp8 board,easic corp,hardware based approach,fpga chip,theoretical peak quadruple-precision performance,processing elements,structured asic,quadruple-hexuple-octuple-precision arithmetic operation,digital arithmetic,computer speed 1.6 gflops,application specific integrated circuits,grape-mps processor,feynman loop integral,octuple-precision arithmetic operation,memory components,fpga,performance result,multiple grape-mps board,feynman loop integrals,structured asic chip,field programmable gate arrays,simd accelerator boards,basic design,frequency 100 mhz,pe,computer speed 1.2 gflops,pipelines,process control,computer architecture,registers

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要