Exploring OpenCL Memory Throughput on the Zynq

semanticscholar(2016)

引用 0|浏览1
暂无评分
摘要
The Zynq platform combines a general purpose processor and a Field Programmable Gate Array (FPGA), all within one single chip. Zynq-like systems are likely to become commonplace in various future computing system, from HPC compute nodes to embedded systems. In an HPC setting, the reconfigurable FPGA will be used to implement application specific accelerator units. Much like a GPU, the FPGA in the Zynq can be programmed using OpenCL but unlike a GPU the flexibility of the Zynq is great. One example of this flexibility is in the interfacing of hardware accelerators with the processing system and memory interface. The programmer is free to choose between various different interfaces, combinations of interfaces and to explore configurations thereof. This flexibility, however, makes even a simple task such as measuring the memory throughput capabilities of the accelerator much more involved and interesting. On a GPU, one would design a kernel that does nothing but move around data and then time its execution. The interfacing is fixed. On the Zynq we need to explore many different interface configurations and obtain different throughput result for each such setting. This document explored a subset of the possible interface configurations in the hope of gaining insights on how to organize accelerators and interfaces for best performance.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要