TT-CIM: Tensor Train Decomposition for Neural Network in RRAM-Based Compute-in-Memory Systems

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS(2023)

引用 0|浏览3
暂无评分
摘要
Compute-in-Memory (CIM) implemented with Resistive-Random-Access-Memory (RRAM) crossbars is a promising approach for accelerating Convolutional Neural Network (CNN) computations. The growing size in the number of parameters in state-of-the-art CNN models, however, creates challenge for on-chip weight storage for CIM implementations, and CNN compression becomes a crucial topic of exploration. Tensor Train (TT) decomposition can be used to decompose a tensor into smaller ones with fewer parameters, at the cost of increased number of computations. In this work we propose a technique to minimize intermediate operations across the full convolution operation and improve hardware utilization to implement TT-CNNs in CIM systems. We first use an iterative decompose-and-fine-tune method to prepare TT-CNNs. We then propose an inter-convolutional-step reuse scheme to reduce the required operation count and post-mapping RRAM count for TT-CNN implementation in tiled-CIM architecture. We demonstrate that through proper mapping, pipelining, and reuse, effective compression ratio of 12 and 20 with 0.8% and 1.4% accuracy drop, respectively for WRN; and effective compression ratio of 6 and 11 with 0.9% and 1.2% accuracy drop for VGG8. We also show that around 30% higher hardware utilization than the original CNN format can be achieved using the proposed TT-CIM approaches.
更多
查看译文
关键词
Compute-in-memory,deep neural network,convolutional neural network,neural network compression,tensor train decomposition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要