A 28nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs.

An Guo,Xin Si,Xi Chen,Fangyuan Dong,Xingyu Pu, Dongqi Li,Yongliang Zhou,Lizheng Ren,Yeyang Xue, Xueshan Dong,Hui Gao,Yiran Zhang, Jingmin Zhang,Yuyao Kong,Tianzhu Xiong,Bo Wang,Hao Cai,Weiwei Shan,Jun Yang

ISSCC（2023）

引用 14|浏览9

暂无评分

摘要

SRAM-based computing-in-memory (SRAM-CIM) has been intensively studied and developed to improve the energy and area efficiency of AI devices. SRAM-CIMs have effectively implemented high integer (INT) precision multiply-and-accumulate (MAC) operations to improve the inference accuracy of various image classification tasks [1]–[3],[5],[6]. To realize more complex AI tasks, such as detection and segmentation, and to support on-chip training for better inference accuracy, floating-point MAC (FP-MAC) operations with high-energy efficiency are required. However, most SRAM-CIMs that previously used digital [5], [6] or analog [1]–[4] in-memory computing cannot effectively support FP-MACs: e.g., Brain Float16 (BF16) datatype. Since supporting high floating-point input (IN), weight (W) and output (OUT) precision for SRAM-CIM may cause (1) inconsistency between the shift-alignment of conventional digital FP-MACs and the structured mapping of most SRAM-CIMs, and (2) results in a more difficult tradeoff between throughput/memory size (T/S), energy efficiency (EF), and memory density (MD), as shown in Fig. 7.2.1.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要