Algorithm and hardware co-design co-optimization framework for LSTM accelerator using quantized fully decomposed tensor train

Mingshuo Liu,Miao Yin,Kevin Han,Ronald F. DeMara,Bo Yuan,Yu Bai

Internet of Things（2023）

引用 2|浏览12

暂无评分

摘要

Nowadays, from companies to academics, researchers across the world are interested in developing Deep Neural Networks (DNNs) due to their incredible feats in various applications, such as image recognition, playing complex games, and large-scale information retrieval such as web search. However, when people enjoy the advantages of DNNs, the high computational and power demands on resource-constrained electronic devices of the DNN model have received more attention. Optimizing the DNN model, such as model compression, is crucial to ensure wide deployment of DNNs and promote DNNs to implement most resource-constrained scenarios. Among many techniques, tensor train (TT) decomposition is considered a very promising technology. Although our previous efforts achieve (1) expanding limits of the number of multiplications eliminating all redundant computations; and (2) decomposing into multistage processing to reduce memory traffic, the potential of this work is not fully explored. In this paper, we investigate and demystify the TT decomposition within a thoughtful hardware mind. This paper develops an efficient hardware optimization methodology within a novel hardware solution. Two key merits will be achieved in this project: (1) it enables a novel approach to apply TT-decomposition on the entire LSTM model; (2) a much more efficient quantization method has been proposed at the hardware optimization level; (3) an efficient hardware accelerator that unitizes the hardware and algorithm co-design method has been designated. Based on these novelties, the proposed work can achieve 1.69x power reduction and 2.28x power efficiency (GOPS/W) in different workloads. In addition, compared to the state-of-the-art C-LSTM, it achieves 2.09x higher throughput, 3.67% accuracy increase, 2.45x power efficiency, and a 1.18x power reduction. The results show that our proposed accelerator exhibits significant advantages over state-of-the-art solutions.

查看译文

关键词

Efficient machine learning, Big data, Low power, Edge computing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要