WRA-MF: A Bit-Level Convolutional-Weight-Decomposition Approach to Improve Parallel Computing Efficiency for Winograd-Based CNN Acceleration

Siwei Xiang, Xianxian Lv,Yishuo Meng,Jianfei Wang,Cimang Lu,Chen Yang

ELECTRONICS（2023）

引用 0|浏览0

暂无评分

摘要

FPGA-based convolutional neural network (CNN) accelerators have been extensively studied recently. To exploit the parallelism of multiplier-accumulator computation in convolution, most FPGA-based CNN accelerators heavily depend on the number of on-chip DSP blocks in the FPGA. Consequently, the performance of the accelerators is restricted by the limitation of the DSPs, leading to an imbalance in the utilization of other FPGA resources. This work proposes a multiplication-free convolutional acceleration scheme (named WRA-MF) to relax the pressure on the required DSP resources. Firstly, the proposed WRA-MF employs the Winograd algorithm to reduce the computational density, and it then performs bit-level convolutional weight decomposition to eliminate the multiplication operations. Furthermore, by extracting common factors, the complexity of the addition operations is reduced. Experimental results on the Xilinx XCVU9P platform show that the WRA-MF can achieve 7559 GOP/s throughput at a 509 MHz clock frequency for VGG16. Compared with state-of-the-art works, the WRA-MF achieves up to a 3.47x-27.55x area efficiency improvement. The results indicate that the proposed architecture achieves a high area efficiency while ameliorating the imbalance in the resource utilization.

查看译文

关键词

convolutional neural networks,acceleration algorithm,convolution weight decomposition,multiplication reduction,hardware efficiency

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要