CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory-Based Neural Network Accelerators

Jinyu Bai,Sifan Sun,Weisheng Zhao,Wang Kang

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS（2024）

引用 0|浏览6

暂无评分

摘要

The novel computing-in-memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This article proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM-based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights, and outputs (or activations, weights, and partial sums in NNs) with four innovative techniques. First, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Second, inspired by the unique computation paradigm of CIM, an innovative arraywise quantization granularity is proposed for weight quantization. Third, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, and ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods.

查看译文

关键词

Quantization (signal),Hardware,Artificial neural networks,Common Information Model (computing),Training,Memory management,Computational efficiency,Bit-level sparsity,computing-in-memory (CIM),neural network quantization,post-training quantization (PTQ),quantization granularity,reparametrization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要