Design Space Exploration of Concurrency Mapping to FPGAs with OpenCL: A Case Study with Shallow Water Model Kernel

IWOCL '20: International Workshop on OpenCL Munich Germany April, 2020（2020）

引用 0|浏览0

暂无评分

摘要

High-performance computing systems consist of multiple multicore nodes and accelerators, where FPGAs are emerging as a viable accelerator for High- Performance Computing (HPC) scientific applications. This emergence is due to the existing of high-level synthesis (HLS) tools and programming languages, such as OpenCL, which can be used to program an HPC system node. The HPC applications kernels include levels of concurrency, and mapping of that available concurrency efficiently to the HPC systems, which comprise FPGAs, is a challenge. This challenge arose from the fact both the OpenCL and HLS tools provide different mechanisms for exploiting the concurrency within a node; thereby leading to a concurrency mapping design problem raising questions about programmability (development effort) and the achieved performance of the different mapping mechanisms options. This paper focuses on examining the concurrency levels available in a case study kernel from the shallow water model, and exploring the options and trade-off (Performance, resource Usage) for exploiting the OpenCL and SDSoC HLS mechanisms for mapping the kernel available concurrency levels to a single node FPGA. The outcome of this paper concludes that the use of SDSoC dataflow attribute delivered the best mapping design with 118.06x performance improvement and resource utilisation: FF 11%, LUTs 24%, DSPs 7%, BRAMs 4% FPGAs. Followed by the pipeline mechanism which delivered 112.44x performance improvement with: FF 13%, LUTs 25%, DSPs 7%, BRAMs 6% resources usage. In contrast, the use of the OpenCL mechanism NDRange reveals a high programmability effort, but a competitive performance (97.96x) to the dataflow mechanism with lower resource usage only: FF 2%, LUTs 4%, DSPs 2%, BRAMs 4%.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要