High Performance Kernel Architecture for Convolutional Neural Network Acceleration

Anakhi Hazarika,Soumyajit Poddar,Hafizur Rahaman

Journal of circuits, systems, and computers（2021）

引用 2|浏览0

暂无评分

摘要

Convolutional neural networks (CNNs) have emerged as a prominent choice in artificial intelligence tasks. Recent advancements in CNN designs have greatly improved the performance and energy-efficiency of several computation-intensive applications. However, in real-time applications, greater accuracy of CNN is attained at the expense of very high computational cost and complexity. Further, the implementation of real-time CNN on embedded platforms is highly challenging due to resource and power constraints. This paper addresses the aforesaid computational complexity and presents an accelerator architecture accompanied by a novel kernel design to improve overall CNN performance. The proposed kernel design introduces a computing mechanism that reduces the data movement cost in terms of computational cycle count (latency) by parallelizing the convolution processing elements. This architecture takes advantage of the overlap of spatially adjacent data. The performance of the proposed architecture is also analyzed for multiple hyper-parameter configurations. The proposed accelerator achieves an average of [Formula: see text] improvement in reduction of execution time than the conventional computing architecture. To analyze the proposed architecture’s performance, we validate the architecture with AlexNet and VGG-16 CNN models. The proposed accelerator architecture achieves an average of [Formula: see text] throughput improvement over state-of-the-art accelerators.

查看译文

关键词

Convolutional neural networks (CNNs),multiply-and-accumulate (MAC),hardware accelerator,CNN acceleration,field programmable gate array (FPGA)

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要