CNN Accelerator with Minimal On-Chip Memory Based on Hierarchical Array.

Hyun-Wook Son, YongSeok Na, TaeHyun Kim,Ali A. Al-Hamid,HyungWon Kim

ISOCC(2021)

引用 5|浏览1
暂无评分
摘要
This paper presents an architecture of CNN accelerator based on a new processing element (PE) array called a diagonal cyclic array. It can significantly reduce the burden of repeated memory accesses for feature data and weight parameters for CNN models. To evaluate the effectiveness of the proposed architecture, we implemented a CNN accelerator for YOLOv4-Tiny consisting of 9 layers. We also present how to optimize the local buffer size with little sacrifice of inference speed. We evaluated the example CNN accelerator using FPGA implementation with 24932 LUTs, 584 DSP blocks and a on-chip memory of only 58KB. It demonstrates an accuracy 58% (mAP0.5) with computation time of 240ms for each input image using a clock speed of 100MHz. This speed is expected to reach 2.4ms using a clock speed of 1GHz, if implemented in a silicon SoC using a sub-micron process.
更多
查看译文
关键词
CNN,Accelerator,SoC,Systolic Array,Memory optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要