Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications

Purab Ranjan Sutradhar,Sathwika Bavikadi,Mark Connolly, Savan Kumar Prajapati,Mark A. Indovina,Sai Manoj Pudukotaidinakarrao,Amlan Ganguly

IEEE Transactions on Parallel and Distributed Systems（2022）

引用 18|浏览39

暂无评分

摘要

Processing in memory (PIM) architecture, with its ability to perform ultra-low-latency parallel processing, is regarded as a more suitable alternative to von Neumann computing architectures for implementing data-intensive applications such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). In this article, we present a Look-up Table (LUT) based PIM architecture aimed at CNN/DNN acceleration that replaces logic-based processing with pre-calculated results stored inside the LUTs in order to perform complex computations on the DRAM memory platform. Our LUT-based DRAM-PIM architecture offers superior performance at a significantly higher energy-efficiency compared to the more conventional bit-wise parallel PIM architectures, while at the same time avoids fabrication challenges associated with the in-memory implementation of logic circuits. Alongside, the processing elements can be programmed and re-programmed to perform virtually any operation, including operations of Convolutional, Fully Connected, Pooling, and Activating Layers of CNN/DNN. Furthermore, it is capable of operating on several combinations of bit-widths of the operand data and thereby offers a wider range of flexibility across performance, precision, and efficiency. Transmission Gate (TG) realization of the circuitry ensures minimal footprint from the PIM architecture. Our simulations demonstrate that the proposed architecture can perform AlexNet inference at a nearly 13× faster rate and 125× more efficiency compared to state-of-the-art GPU and also provides 1.35× higher throughput at 2.5× higher energy-efficiency than another recent DRAM-implemented LUT-based PIM architecture in its baseline operation mode. Moreover, it offers 12× higher frame-rate at 9× more efficiency per frame for the lowest operand precision setting, with respect to its own baseline operation mode.

查看译文

关键词

Processing in memory (PIM),look-up table (LUT),deep neural networks (DNN),convolutional neural networks (CNN)

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要