AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
This paper presented ShiftAddNet, whose main inspiration is drawn from a common practice in energy-efficient hardware implementation, that is, multiplication can be instead performed with additions and logical bit-shifts

ShiftAddNet: A Hardware-Inspired Deep Network

NIPS 2020, (2020)

Cited by: 0|Views83
EI
Full Text
Bibtex
Weibo

Abstract

Multiplication (e.g., convolution) is arguably a cornerstone of modern deep neural networks (DNNs). However, intensive multiplications cause expensive resource costs that challenge DNNs' deployment on resource-constrained edge devices, driving several attempts for multiplication-less deep networks. This paper presented ShiftAddNet, whos...More
0
Introduction
  • Powerful deep neural networks (DNNs) come at the price of prohibitive resource costs during both DNN inference and training, limiting the application feasibility and scope of DNNs in resourceconstrained device for more pervasive intelligence.
  • DNNs are largely composed of multiplication operations for both forward and backward propagation, which are much more computationally costly than addition [1].
  • ShiftNet [2, 3] adopted spatial shift operations paired with pointwise convolutions, to replace a large portion of convolutions.
  • AdderNet [5] pioneered to demonstrate the feasibility and promise of replacing all convolutions with merely addition operations
Highlights
  • Powerful deep neural networks (DNNs) come at the price of prohibitive resource costs during both DNN inference and training, limiting the application feasibility and scope of DNNs in resourceconstrained device for more pervasive intelligence
  • DNNs are largely composed of multiplication operations for both forward and backward propagation, which are much more computationally costly than addition [1]
  • Compared with AdderNet, ShiftAddNet achieves 34.1% ∼ 80.9% energy cost reductions while offering 1.08% ∼ 3.18% higher accuracies; and compared with DeepShift (PS), ShiftAddNet achieves 34.1% ∼ 50.9% energy savings while improving accuracies by 5.5% ∼ 6.9%
  • We propose a multiplication-free ShiftAddNet for efficient DNN training and inference inspired by the well-known shift and add hardware expertise, and show that ShiftAddNet achieves improved expressiveness and parameter efficiency, solving the drawbacks of networks with merely shift and add operations
Methods
  • Accuracy (%) Adaptation Finetuning Energy Costs (MJ)

    ResNet20 on CIFAR10

    DeepShift AdderNet ShiftAddNet ShiftAddNet (Fixed)

    Adaptation.
  • The authors split CIFAR-10 into two non-overlapping subsets.
  • The authors first pre-train the model on one subset and retrain it on the other set to see how accurately and efficiently they can adapt to the new task.
  • Fine-tuning.
  • The authors randomly split CIFAR-10 into two non-overlapping subsets, the difference is that each subset contains all classes.
  • After pre-training on the first subset, the authors fine-tune the model on the other, expecting to see a continuous growth in performance
Results
  • ShiftAddNet with fixed shift layers can achieve up to 90% and 82.8% energy savings than fully additive models [5] and shift models [4] under floating-point or fixed-point precisions trainings, while leading to a comparable or better accuracies (-3.7% ∼ +31.2% and 3.5% ∼ 23.6%), respectively.
  • As shown in Fig. 4, the authors can see that (1) Overall, ShiftAddNet with fixed shift layers can achieve up to 90.0% and 82.8% energy savings than AdderNet and DeepShift, while leading to comparable or better accuracies (-3.74% ∼ +31.2% and 3.5% ∼ 23.6%), respectively; and (2) interestingly, ShiftAddNet with fixed shift layers surpasses the generic ShiftAddNet from two aspects: First, it always demands less energy.
  • 0 0 20 40 60Epo80chs100 120 140 160 are largely pruned
Conclusion
  • The authors propose a multiplication-free ShiftAddNet for efficient DNN training and inference inspired by the well-known shift and add hardware expertise, and show that ShiftAddNet achieves improved expressiveness and parameter efficiency, solving the drawbacks of networks with merely shift and add operations.
  • ShiftAddNet enables more flexible control of different levels of granularity in the network training than ConvNet. Interestingly, the authors find that fixing ShiftAddNet’s shift layers even leads to a comparable or even better accuracy for over-parameterized networks on the considered IoT applications.
Tables
  • Table1: Unit energy comparisons using ASIC & FPGA
  • Table2: Adaptation and fine-tuning results comparisons using ResNet-20 trained on CIFAR-10
Download tables as Excel
Related work
  • Multiplication-less DNNs. Shrinking the cost-dominate multiplications has been widely considered in many DNN designs for reducing the computational complexity [10, 11]: [10] decomposes the convolutions into separate depthwise and pointwise modules which require fewer multiplications; and [12, 13, 14] binarize the weights or activations to construct DNNs consisting of sign changes paired with much fewer multiplications. Another trend is to replace the multiplication operations with other cheaper operations. Specifically, [3, 2] leverage spatial shift operations to shift feature maps, which needs to be cooperated with pointwise convolution to aggregate spatial information; [4] fully replaces multiplications with both bit-wise shift operations and sign changes; and [5, 15, 16] trade multiplications for cheaper additions and develop a special backpropogation scheme for effectively training the add-only networks.

    Hardware costs of basic operations. As compared to shift and add, multipliers can be very inefficient in hardware as they require high hardware costs in terms of consumed energy/time and chip area. Shift and add operations can be a substitute for such multipliers. For example, they have been adopted for saving computer resources and can be easily and efficiently performed by a digital processor [17]. This hardware idea has been adopted to accelerate multilayer perceptrons (MLP) in digital processors [8]. We here motivated by such hardware expertise to fully replace multiplications in modern DNNs with merely shift and add, aiming to solve the drawbacks in existing shift-only or add-only replacements methods and to boost the network efficiency over multiplication-based DNNs.
Funding
  • ShiftAddNet with fixed shift layers can achieve up to 90% and 82.8% energy savings than fully additive models [5] and shift models [4] under floating-point or fixed-point precisions trainings, while leading to a comparable or better accuracies (-3.7% ∼ +31.2% and 3.5% ∼ 23.6%), respectively
  • AdderNets [5] achieves a 1.37% higher accuracy than that of DeepShift [4] at a cost of similar or even lower FLOPs on ResNet-18 with the ImageNet dataset
  • Baselines: We evaluate the proposed ShiftAddNet over two SOTA multiplication-less networks, including AdderNet [5] and DeepShift (use DeepShift (PS) by default) [4], and also compare it to the multiplication-based ConvNet [45] under a comparable energy cost (± 30% more than AdderNet (FP32))
  • Meanwhile, ShiftAddNet achieves 2.41% ∼ 16.1% higher accuracies while requiring 34.1% ∼ 70.9% less energy costs, as compared to DeepShift [4]
  • Compared with AdderNet, ShiftAddNet achieves 34.1% ∼ 80.9% energy cost reductions while offering 1.08% ∼ 3.18% higher accuracies; and compared with DeepShift (PS), ShiftAddNet achieves 34.1% ∼ 50.9% energy savings while improving accuracies by 5.5% ∼ 6.9%
  • We can see that ShiftAddNet always achieves a better accuracy over the two SOTA multiplication-less networks
  • As shown in Fig. 4, we can see that (1) Overall, ShiftAddNet with fixed shift layers can achieve up to 90.0% and 82.8% energy savings than AdderNet (with floating-point or fixed-point precisions) and DeepShift, while leading to comparable or better accuracies (-3.74% ∼ +31.2% and 3.5% ∼ 23.6%), respectively; and (2) interestingly, ShiftAddNet with fixed shift layers also surpasses the generic ShiftAddNet from two aspects: First, it always demands less energy
  • (b) Weights Distribution in ShiftAddNet (c) Test Accuracy Comparison costs (25.2% ∼ 40.9%) to achieve a comparable or even better accuracy; and second, it can even achieve a better accuracy and better robustness to quantizaiton (up to 10.8% improvement for 8-bit fixed-point training) than the generic ShiftAddNet with learnable shift layers, when evaluated with VGG19-small on CIFAR-100
  • 0 0 20 40 60Epo80chs100 120 140 160 are largely pruned (e.g., up to 70%)
Study subjects and analysis
datasets: 4
ShiftAddNet over SOTA on classification. The results on four datasets and two DNNs in Fig. 2 (a), (b), (e), and (d) show that ShiftAddNet can consistently outperform all competitors in terms of the measured energy cost, while improving the task accuracies. Specifically, with full-precision floating-point (FP32) ShiftAddNet even surpasses both the multiplication-based ConvNet and the AdderNet: when training ResNet-20 on CIFAR-10, ShiftAddNet reduces 33.7% and 44.6% of the training energy costs as compared to AdderNet and ConvNet [45], respectively, outperforming SOTA multiplication-based ConvNet and thus validating our Hypothesis (2) in Section 3.1; and ShiftAddNet demonstrates notably improved robustness to quantization as compared to AdderNet: a quantized ShiftAddNet with 8-bit fixd-point presentation reduces 65.1% ∼ 75.0% of the energy costs over the reported one of AdderNet (with a floating-point precision, as denoted as FP32) while offering comparable accuracies (-1.79% ∼ +0.18%), and achieves a greatly higher accuracy (7.2% ∼ 37.1%) over the quantized AdderNet (FIX32/8) while consuming comparable or even less energy costs (-25.2% ∼ 25.2%)

Reference
  • Mark Horowitz. Energy table for 45nm process. In Stanford VLSI wiki. 2014.
    Google ScholarLocate open access versionFindings
  • Bichen Wu, Alvin Wan, Xiangyu Yue, Peter Jin, Sicheng Zhao, Noah Golmant, Amir Gholaminejad, Joseph Gonzalez, and Kurt Keutzer. Shift: A zero flop, zero parameter alternative to spatial convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9127–9135, 2018.
    Google ScholarLocate open access versionFindings
  • Weijie Chen, Di Xie, Yuan Zhang, and Shiliang Pu. All you need is a few shifts: Designing efficient convolutional neural networks for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7241–7250, 2019.
    Google ScholarLocate open access versionFindings
  • Mostafa Elhoushi, Farhan Shafiq, Ye Tian, Joey Yiwei Li, and Zihao Chen. Deepshift: Towards multiplication-less neural networks. arXiv preprint arXiv:1905.13298, 2019.
    Findings
  • Hanting Chen, Yunhe Wang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, and Chang Xu. Addernet: Do we really need multiplications in deep learning? The IEEE Conference on Computer Vision and Pattern Recognition, 2020.
    Google ScholarLocate open access versionFindings
  • Ping Xue and Bede Liu. Adaptive equalizer using finite-bit power-of-two quantizer. IEEE transactions on acoustics, speech, and signal processing, 34(6):1603–1611, 1986.
    Google ScholarLocate open access versionFindings
  • Y. Lin, S. Zhang, and N. R. Shanbhag. Variation-tolerant architectures for convolutional neural networks in the near threshold voltage regime. In 2016 IEEE International Workshop on Signal Processing Systems (SiPS), pages 17–22, 2016.
    Google ScholarLocate open access versionFindings
  • Michele Marchesi, Gianni Orlandi, Francesco Piazza, and Aurelio Uncini. Fast neural networks without multipliers. IEEE transactions on Neural Networks, 4(1):53–62, 1993.
    Google ScholarLocate open access versionFindings
  • Getting Started Guide. Zynq-7000 all programmable soc zc706 evaluation kit (ise design suite 14.7). 2012.
    Google ScholarFindings
  • Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
    Findings
  • Y. Lin, C. Sakr, Y. Kim, and N. Shanbhag. PredictiveNet: An energy-efficient convolutional neural network via zero prediction. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–4, 2017.
    Google ScholarLocate open access versionFindings
  • Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. Neural networks with few multiplications. arXiv preprint arXiv:1510.03009, 2015.
    Findings
  • Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016.
    Findings
  • Koen Helwegen, James Widdicombe, Lukas Geiger, Zechun Liu, Kwang-Ting Cheng, and Roeland Nusselder. Latent weights do not exist: Rethinking binarized neural network optimization. In Advances in Neural Information Processing Systems 32, pages 7533–7544. 2019.
    Google ScholarLocate open access versionFindings
  • Dehua Song, Yunhe Wang, Hanting Chen, Chang Xu, Chunjing Xu, and DaCheng Tao. Addersr: Towards energy efficient image super-resolution. arXiv preprint arXiv:2009.08891, 2020.
    Findings
  • Yixing Xu, Chang Xu, Xinghao Chen, Wei Zhang, Chunjing Xu, and Yunhe Wang. Kernel based progressive distillation for adder neural networks. arXiv preprint arXiv:2009.13044, 2020.
    Findings
  • Jose-Luis Sanchez-Romero, Antonio Jimeno-Morenilla, Rafael Molina-Carmona, and Jose Perez-Martinez. An approach to the application of shift-and-add algorithms on engineering and industrial processes. Mathematical and Computer Modelling, 57(7-8):1800–1806, 2013.
    Google ScholarLocate open access versionFindings
  • Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, and Zhangyang Wang. E2-train: Training state-of-the-art cnns with over 80% energy savings. In Advances in Neural Information Processing Systems, pages 5138–5150, 2019.
    Google ScholarLocate open access versionFindings
  • Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Xiaohan Chen, Richard G Baraniuk, Zhangyang Wang, and Yingyan Lin. Drawing early-bird tickets: Toward more efficient training of deep networks. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Zhaohui Yang, Yunhe Wang, Chuanjian Liu, Hanting Chen, Chunjing Xu, Boxin Shi, Chao Xu, and Chang Xu. Legonet: Efficient convolutional neural networks with lego filters. In International Conference on Machine Learning, pages 7005–7014, 2019.
    Google ScholarLocate open access versionFindings
  • Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1580–1589, 2020.
    Google ScholarLocate open access versionFindings
  • Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, and Dale Schuurmans. Go wide, then narrow: Efficient training of deep thin networks. arXiv preprint arXiv:2007.00811, 2020.
    Findings
  • Hanting Chen, Yunhe Wang, Han Shu, Yehui Tang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, and Chang Xu. Frequency domain compact 3d convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1641–1650, 2020.
    Google ScholarLocate open access versionFindings
  • Weiyang Liu, Rongmei Lin, Zhen Liu, James M Rehg, Li Xiong, and Le Song. Orthogonal over-parameterized training. arXiv preprint arXiv:2004.04690, 2020.
    Findings
  • Felix Juefei-Xu, Vishnu Naresh Boddeti, and Marios Savvides. Local binary convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 19–28, 2017.
    Google ScholarLocate open access versionFindings
  • Xilinx Inc. Xilinx zynq-7000 soc zc706 evaluation kit. https://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html. (Accessed on 09/30/2020).
    Findings
  • Or Sharir and Amnon Shashua. On the expressive power of overlapping architectures of deep learning. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Huasong Zhong, Xianggen Liu, Yihui He, and Yuchun Ma. Shift-based primitives for efficient convolutional neural networks. arXiv preprint arXiv:1809.08458, 2018.
    Findings
  • Arman Afrasiyabi, Diaa Badawi, Baris Nasir, Ozan Yildi, Fatios T Yarman Vural, and A Enis Çetin. Non-euclidean vector product for neural networks. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6862–6866. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Chen Wang, Jianfei Yang, Lihua Xie, and Junsong Yuan. Kervolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
    Google ScholarLocate open access versionFindings
  • Haobin Dou and Xihong Wu. Coarse-to-fine trained multi-scale convolutional neural networks for image classification. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2015.
    Google ScholarLocate open access versionFindings
  • Yang Zhao, Xiaohan Chen, Yue Wang, Chaojian Li, Haoran You, Yonggan Fu, Yuan Xie, Zhangyang Wang, and Yingyan Lin. Smartexchange: Trading higher-cost memory storage/access for lower-cost computation. arXiv preprint arXiv:2005.03403, 2020.
    Findings
  • Weitao Li, Pengfei Xu, Yang Zhao, Haitong Li, Yuan Xie, and Yingyan Lin. Timely: Pushing data movements and interfaces in pim accelerators towards local and in time domain, 2020.
    Google ScholarFindings
  • B. Murmann. Mixed-signal computing for deep neural network inference. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pages 1–11, 2020.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
    Findings
  • Oresti Banos, Rafael Garcia, Juan A Holgado-Terriza, Miguel Damas, Hector Pomares, Ignacio Rojas, Alejandro Saez, and Claudia Villalonga. mhealthdroid: a novel framework for agile development of mobile health applications. In International workshop on ambient assisted living, pages 91–98.
    Google ScholarLocate open access versionFindings
  • J. Tan, L. Niu, J. K. Adams, V. Boominathan, J. T. Robinson, R. G. Baraniuk, and A. Veeraraghavan. Face detection and verification using lensless cameras. IEEE Trans. Comput. Imag., 5(2):180–194, June 2019.
    Google ScholarLocate open access versionFindings
  • Mi Zhang and Alexander A Sawchuk. Usc-had: a daily activity dataset for ubiquitous activity recognition using wearable sensors. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pages 1036–1043, 2012.
    Google ScholarLocate open access versionFindings
  • Nicolas Gourier, Daniela Hall, and James L Crowley. Estimating face orientation from robust detection of salient facial structures. In FG Net workshop on visual observation of deictic gestures, volume 6, page 7. FGnet (IST–2000–26434) Cambridge, UK, 2004.
    Google ScholarLocate open access versionFindings
  • Vivek Boominathan, Jesse Adams, Jacob Robinson, and Ashok Veeraraghavan. Phlatcam: Designed phase-mask based thin lensless camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
    Google ScholarLocate open access versionFindings
  • Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. Rethinking the value of network pruning. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggface2: A dataset for recognising faces across pose and age. In International Conference on Automatic Face and Gesture Recognition, 2018.
    Google ScholarLocate open access versionFindings
  • Wenchao Jiang and Zhaozheng Yin. Human activity recognition using wearable sensors by deep convolutional neural networks. In Proceedings of the 23rd ACM international conference on Multimedia, pages 1307–1310, 2015.
    Google ScholarLocate open access versionFindings
  • Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • A. Beaumont-Smith, N. Burgess, S. Lefrere, and C. C. Lim. Reduced latency ieee floating-point standard adder architectures. In Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336), pages 35–42, 1999.
    Google ScholarLocate open access versionFindings
  • Yukuan Yang, Lei Deng, Shuang Wu, Tianyi Yan, Yuan Xie, and Guoqi Li. Training highperformance and large-scale deep neural networks with full 8-bit integers. Neural Networks, 2020.
    Google ScholarLocate open access versionFindings
  • Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. Scalable methods for 8-bit training of neural networks. In Advances in neural information processing systems, pages 5145–5153, 2018.
    Google ScholarLocate open access versionFindings
  • Laurens Van der Maaten. t-distributed stochastic neighbor embedding (t-sne), 2014.
    Google ScholarFindings
  • Chaojian Li, Tianlong Chen, Haoran You, Zhangyang Wang, and Yingyan Lin. HALO: Hardware-aware learning to optimize. In The 16th European Conference on Computer Vision (ECCV 2020), 2020.
    Google ScholarLocate open access versionFindings
  • Yang You, Zhao Zhang, James Demmel, Kurt Keutzer, and Cho-Jui Hsieh. Imagenet training in 24 minutes. arXiv preprint arXiv:1709.05011, 2017.
    Findings
  • Emma Strubell, Ananya Ganesh, and Andrew McCallum. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243, 2019.
    Findings
Author
Yongan Zhang
Yongan Zhang
Chaojian Li
Chaojian Li
Sicheng Li
Sicheng Li
Zihao Liu
Zihao Liu
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科