An FPGA based Tiled Systolic Array Generator to Accelerate CNNs

2022 25th Euromicro Conference on Digital System Design (DSD)(2022)

引用 0|浏览0
暂无评分
摘要
The main computation in any CNN is convolution operation. This computation shows significant potential for massively parallel implementations on an FPGA. Systolic arrays with their intrinsic pipelining have been explored for CNN inference. In this paper, we present a systolic array architecture suitably designed for a novel method of convolution operation. We implement an image-kernel convolution and test it with representative image inputs to several models like LeNet-5, AlexNet, VGG-16, and Resnet-34. We compare the proposed design with conventional convolution and HLS based designs. We limit our implementation to resource constrained FPGA: AMD-Xilinx Zynq 7020 platform. We observe that the proposed architecture outperforms the direct convolution method and HLS pipelined designs by 2× and 2.1×, respectively, on average. Since DSP blocks are scarce resources, we constrain our implementation to avoid DSP blocks and use the LUTs instead. Thus, our implementation uses nearly 9× more LUTs than baseline convolution but 8× fewer LUTs than the HLS pipelined implementation. We further accelerate the convolution throughput by 11×. We achieve this by implementing a tiled systolic architecture that completely utilises the parallel computing resources of the FPGA.
更多
查看译文
关键词
Convolutional neural networks (CNNs),FPGA,systolic array,hardware accelerators
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要