High Pe Utilization Cnn Accelerator With Channel Fusion Supporting Pattern-Compressed Sparse Neural Networks

Jingyu Wang,Songming Yu,Jinshan Yue,Zhe Yuan,Zhuqing Yuan,Huazhong Yang,Xueqing Li,Yongpan Liu

PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC)（2020）

引用 15|浏览66

暂无评分

摘要

Recently CNN-based methods have made remarkable progress in broad fields. Both network pruning algorithms and hardware accelerators have been introduced to accelerate CNN. However, existing pruning algorithms have not fully studied the pattern pruning method, and current index storage scheme of sparse CNN is not efficient. Furthermore, the performance of existing accelerators suffers from no-load PEs on sparse networks. This work proposes a software-hardware co-design to address these problems. The software includes an ADMM-based method which compresses the patterns of convolution kernels with acceptable accuracy loss, and a Huffman encoding method which reduces index storage overhead. The hardware is a fusion-enabled systolic architecture, which can reduce PEs' no-load rate and improve performance by supporting the channel fusion. On CIFAR-10, this work achieves 5.63x index storage reduction with 2-7 patterns among different layers with 0.87% top-1 accuracy loss. Compared with the state-of-art accelerator, this work achieves 1.54x-1.79x performance and 25%-34% reduction of no-load rate with reasonable area and power overheads.

查看译文

关键词

sparse CNN accelerators, pruning algorithms, pattern compression, channel fusion, no-load rate reduction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要