24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors

international solid-state circuits conference(2019)

引用 150|浏览81
暂无评分
摘要
Embedded nonvolatile memory (NVM) and computing-in-memory (CIM) are significantly reducing the latency (t MAC ) and energy consumption (E MAC ) of multiply- and-accumulate (MAC) operations in artificial intelligence (AI) edge devices [1, 2]. Previous ReRAM CIM macros demonstrated MAC operations for lb-input, ternary- weighted, 3b-output CNNs [1] or lb-input, 8b-weighted, 1b-output fully-connected networks with limited accuracy [2]. To support higher-accuracy convolution neural network heavy applications NVM-CIM should support multibit inputs/weights and multi-bit output (MAC-OUT) for CNN operations. One way to achieve multibit weights is to use a multi-level ReRAM cell to store the weight. However, as shown in Fig. 24.1.1, multibit ReRAM CIM faces several challenges. (1) a tradeoff between area and speed for multibit input/weight/MAC-OUT MAC operations; (2) sense amplifier’s high input offset, large area, and high parasitic load on the read-path due to large BL currents (I BL ) from multibit MAC; (3) limited accuracy due to a small read/sensing margin (I SM ) across MAC-OUT or variation in cell resistance (particularly MLC cells). To overcome these challenges, this work proposes, (1) a serial-input non-weighted product (SINWP) structure to optimize the tradeoff between area, t MAC and E MAC , (2) a down-scaling weighted current translator (DSWCT) and positive-negative current- subtractor (PN-ISUB) for short delay, a small offset and a compact read-path area; and (3) a triple-margin small-offset current-mode sense amplifier (TMCSA) to tolerate a small I SM . A fabricated 55nm 1Mb ReRAM-CIM macro is the first ReRAM CIM macro to support CNN operations using multibit input/weight MAC-OUT. This device achieves the shortest CIM-MAC-access time (t AC ) among existing ReRAM-CIMs (t MAC =14.6ns with 2b-input, 3b-weight with 4b-MAC-OUT) and the best peak E MAC of 53.17 TOPS/W (in binary mode).
更多
查看译文
关键词
ReRAM Computing-In-Memory Macro,CNN based AI edge processors,embedded nonvolatile memory,artificial intelligence edge devices,MAC operations,CNN operations,multibit weights,multilevel ReRAM cell,multibit ReRAM CIM,multibit MAC,serial-input nonweighted product structure,positive-negative current- subtractor,ReRAM CIM macro,multibit input/weight MAC-OUT,shortest CIM-MAC-access time,ReRAM-CIMs,4b-MAC-OUT,multiply-and-accumulate operations,sense amplifier,ReRAM-CIM macro,SINWP,multibit input/weight/MAC-OUT MAC operations,convolution neural network heavy applications,parallel MAC computing time,time 14.6 ns,size 55.0 nm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要