A Kernel Unfolding Approach to Trade Data Movement with Computation Power for CNN Acceleration

Yueh-Han Wu,Tse-Yuan Wang,Yuan-Hao Chang,Tei-Wei Kuo,Hung-Sheng Chang

2020 9th Non-Volatile Memory Systems and Applications Symposium (NVMSA)（2020）

引用 0|浏览26

暂无评分

摘要

Convolutional neural networks (CNN) achieves human-level accuracy on the image classification applications. However, its complicated structure brings the large requirement on the MAC operations and result in huge cost on the data movement. In addition, this situation becomes worse when the asymmetric growth of the computing power and memory speed happens on the von Neumann-based architecture. Recently, processing in memory (PIM) design is adopted to reduce the data communication cost by storing parameters into memory. However, significant cost on feeding input feature map is a big challenge, especially for high bandwidth but long access latency PIM devices. Thus, we explore an idea that how to trade the space in PIM to eliminate such cost. A kernel unfolding technique is proposed to eliminate the duplicated feeding on input feature map, and meanwhile, memory cells in PIM are highly utilized to achieve peak computing throughput. Thus, the memory bandwidth could be utilized efficiently and the corresponding execution time could be reduced significantly. The results show that the proposed design could achieve up to 16.2x cycle improvement compared to traditional PIM designs.

查看译文

关键词

kernel unfolding approach,computation power,CNN acceleration,convolutional neural networks,human-level accuracy,image classification applications,MAC operations,asymmetric growth,von Neumann-based architecture,data communication cost,memory cells,peak computing throughput,memory bandwidth,traditional PIM designs,data movement trading,processing in memory design,input feature map feeding,long access latency PIM devices

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要