Accelerating Convolutional Neural Network With Fft On Tiny Cores

2017 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)(2017)

引用 45|浏览28
暂无评分
摘要
Fueled by ILSVRC and COCO competitions, Convolutional Neural Network (CNN) has become important in computer vision, and natural language processing. However state-of-the-art CNNs are computationally and memory intensive, thus energy efficient implementation on embedded platform is challenging. Recently VGGNet and ResNet showed that deep neural networks with more convolution layers (CV) and few fully connected layer (FC) can achieve lower error rates, thus reducing the complexity of convolution layers is of utmost importance. To reduce computations and shared memory usage in convolution layers, in this paper we evaluate the performance of direct convolution (Direct-Conv), Fast Fourier Transform (FFT) based convolution (FFT-Conv), and Overlap and Add FFT convolution (FFT-OVA-Conv) in embedded architecture including a low power domain specific many-core architecture called Power Efficient Nano Clusters (PENC) and ARM Cortex A53 CPU. To demonstrate the efficiency of FFT-Conv and FFT-OVA-Conv, we map ResNet-20 for the CIFAR-10 dataset on PENC as well as in ARM Cortex A53 CPU. Results are evaluated and compared with respect to throughput per watt, energy delay product, and execution time for three methods. Using built-in FFT instruction in PENC, the FFT-OVA-Conv performs 2.9x and 1.65x faster and achieves 6.7x and 2.3x better throughput per watt than Direct-Conv and FFT-Conv respectively. In ARM A53 CPU, the FFT-OVA-Conv achieves 3.36x and 1.38x improvement in execution time and 2.72x and 1.32x better throughput than Direct-Conv and FFT-Conv.
更多
查看译文
关键词
Convolutional Neural Network, Domain-Specific Many-Core Accelerator, FFT Overlap and Add, Energy Efficient
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要