A Low-Latency DNN Accelerator Enabled by DFT-Based Convolution Execution Within Crossbar Arrays

Hasita Veluri,Umesh Chand,Chun-Kuei Chen,Aaron Voon-Yew Thean

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS（2023）

引用 0|浏览1

暂无评分

摘要

Analog resistive random access memory (RRAM) devices enable parallelized nonvolatile in-memory vector-matrix multiplications for neural networks eliminating the bottlenecks posed by von Neumann architecture. While using RRAMs improves the accelerator performance and enables their deployment at the edge, the high tuning time needed to update the RRAM conductance states adds significant burden and latency to real-time system training. In this article, we develop an in-memory discrete Fourier transform (DFT)-based convolution methodology to reduce system latency and input regeneration. By storing the static DFT/inverse DFT (IDFT) coefficients within the analog arrays, we keep digital computational operations using digital circuits to a minimum. By performing the convolution in reciprocal Fourier space, our approach minimizes connection weight updates, which significantly accelerates both neural network training and interference. Moreover, by minimizing RRAM conductance update frequency, we mitigate the endurance limitations of resistive nonvolatile memories. We show that by leveraging the symmetry and linearity of DFT/IDFTs, we can reduce the power by 1.57 x for convolution over conventional execution. The designed hardware-aware deep neural network (DNN) inference accelerator enhances the peak power efficiency by 28.02 x and area efficiency by 8.7 x over state-of-the-art accelerators. This article paves the way for ultrafast, low-power, compact hardware accelerators.

查看译文

关键词

Compute-in-memory,fast Fourier transform convolutional neural networks (CNNs),in-memory discrete Fourier transform (DFT),memristor crossbars

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要