High Pe Utilization Cnn Accelerator With Channel Fusion Supporting Pattern-Compressed Sparse Neural Networks

PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC)(2020)

引用 15|浏览66
暂无评分
摘要
Recently CNN-based methods have made remarkable progress in broad fields. Both network pruning algorithms and hardware accelerators have been introduced to accelerate CNN. However, existing pruning algorithms have not fully studied the pattern pruning method, and current index storage scheme of sparse CNN is not efficient. Furthermore, the performance of existing accelerators suffers from no-load PEs on sparse networks. This work proposes a software-hardware co-design to address these problems. The software includes an ADMM-based method which compresses the patterns of convolution kernels with acceptable accuracy loss, and a Huffman encoding method which reduces index storage overhead. The hardware is a fusion-enabled systolic architecture, which can reduce PEs' no-load rate and improve performance by supporting the channel fusion. On CIFAR-10, this work achieves 5.63x index storage reduction with 2-7 patterns among different layers with 0.87% top-1 accuracy loss. Compared with the state-of-art accelerator, this work achieves 1.54x-1.79x performance and 25%-34% reduction of no-load rate with reasonable area and power overheads.
更多
查看译文
关键词
sparse CNN accelerators, pruning algorithms, pattern compression, channel fusion, no-load rate reduction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要