A 28nm 16.9-300TOPS/W Computing-in-Memory Processor Supporting Floating-Point NN Inference/Training with Intensive-CIM Sparse-Digital Architecture

ISSCC(2023)

引用 10|浏览170
暂无评分
摘要
Computing-in-memory (CIM) has shown high energy efficiency on low-precision integer multiply-accumulate (MAC) [1–3]. However, implementing floating-point (FP) operations using CIM has not been thoroughly explored. Previous FP CIM chips [4–5] require either complex in-memory FP logic or have lengthy alignment-cycle latencies arising from converting FP data having different exponents into integer data. The challenges for an energy-efficient and accurate FP CIM processor are shown in Fig. 16.3.1. Firstly, aligning an FP vector onto a CIM module requires a long bit-serial sequence due to infrequent but long tail values, incurring many CIM cycles. In this work, we observe that most exponents of FP data are clustered in a small range, which motivates dividing FP operations into high-efficiency intensive-CIM and flexible sparse-digital parts. Secondly, to implement the intensive-CIM + sparse-digital FP workflow, a sparse digital core is required for flexible intensive/sparse processing. Thirdly, the FP alignment brings more random sparsity. Though analog CIM can utilize random sparsity with a low-resolution ADC, the corresponding sparse strategy for digital CIM has not been explored.
更多
查看译文
关键词
alignment-cycle latencies,analog CIM,computing-in-memory processor,digital CIM,energy efficiency,flexible intensive-sparse processing,floating-point NN inference-training,floating-point operations,FP alignment,FP CIM chips,FP CIM processor,FP vector,in-memory FP logic,integer data,intensive-CIM sparse-digital architecture,intensive-CIM-sparse-digital FP workflow,long bit-serial sequence,low-precision integer multiply-accumulate,low-resolution ADC,sparse digital core
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要