Flexible and Efficient Convolutional Acceleration on Unified Hardware Using the Two-Stage Splitting Method and Layer-Adaptive Allocation of 1-D/2-D Winograd Units

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS(2024)

引用 0|浏览1
暂无评分
摘要
General convolution acceleration, such as Winograd and FFT, is a promising direction to address the computational complexity of current convolutional neural networks (CNNs). However, the flexibility of these CNNs makes this kind of scheme always introduce massive redundant computations, damaging the acceleration effect. In this article, a two-stage splitting method for arbitrarily sized tensors and filters and a unified hardware architecture using layer-adaptive allocated Winograd units are proposed, achieving effective redundance elimination and unified architecture. First, a tensor adaptive presplitting method is proposed to divide the original tensors to match the rule of Winograd. Furthermore, a Winograd-based extended splitting scheme is designed to reduce the redundant calculations; therefore, a substantial reduction in multiplication operations in convolutional layers achieved 30.6%-75% savings. Finally, a unified hardware architecture with a layer-adaptive allocation method is proposed to evaluate and select the optimal Winograd F(m, r) units and input/output parallelisms. This architecture is evaluated based on the Xilinx XCVU9P platform and achieves 1.97/1.23/1.60/1.25 GOPS/DSP for AlexNet, VGG16, modified VGG16, and ResNet18, respectively. It achieves up to 5.81x improvements in DSP efficiency compared with previous FPGAbased designs.
更多
查看译文
关键词
Filtering algorithms,Convolution,Tensors,Hardware,Computer architecture,Parallel processing,Optimization,Convolutional neural network (CNN) acceleratio,two-stage splitting,unified hardware generation,Winograd algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要