Work-in-progress: a power-efficient and high performance FPGA accelerator for convolutional neural networks

Lei Gong,Chao Wang,Xi Li,Huaping Chen,Xuehai Zhou

International Conference on Hardware/Software Codesign and System Synthesis（2017）

引用 0|浏览0

暂无评分

摘要

Recently, FPGAs have been widely used in the implementation of hardware accelerators for Convolutional Neural Networks (CNN), especially on mobile and embedded devices. However, most of these existing accelerators are designed with the same concept as their ASIC counterparts, that is all operations from different CNN layers are mapped to the same hardware units and work in a multiplexed way. Although this approach improves the generality of these accelerators, it does not take full advantage of reconfigurability and customizability of FPGAs, resulting in a certain degree of computational efficiency degradation, which is even worse on the embedded platforms. In this paper, we propose an FPGA-based CNN accelerator with all the layers mapped to their own on-chip units, and working concurrently as a pipeline. A strategy which can find the optimized paralleling scheme for each layer is proposed to eliminate the pipeline stall and achieve high resource utilization. In addition, a balanced pruning-based method is applied on fully connected (FC) layers to reduce the computational redundancy. As a case study, we implement a widely used CNNs model, LeNet-5, on an embedded FPGA device, Xilinx Zedboard. It can achieve a peak performance of 39.78 GOP/s and the power efficiency with a value 19.6 GOP/s/W which outperforms previous approaches.

查看译文

关键词

high performance fpga accelerator,high performance,work-in-progress,power-efficient

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要