Adaptive Quantization for Deep Neural Network

national conference on artificial intelligence, 2018.

Cited by: 45|Bibtex|Views40
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
Inspired by the analysis in, we propose a method to measure the effect of parameter quantization errors in individual layers on the overall model prediction accuracy

Abstract:

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large memory consumption, which may not be affordable for mobile platforms. Deep model quantization can be u...More

Code:

Data:

0
Introduction
Highlights
  • Deep neural networks (DNNs) have achieved significant success in various machine learning applications, including image classification (Krizhevsky, Sutskever, and Hinton 2012; Simonyan and Zisserman 2014; Szegedy et al 2015), image retrieval (Hoang et al 2017; Do, Doan, and Cheung 2016), and natural language processing (Deng, Hinton, and Kingsbury 2013)
  • While the DNNs are powerful for various tasks, the increasing computational and memory costs make it difficult to apply on mobile platforms, considering the limited storage space, computation power, energy supply of mobile devices (Han, Mao, and Dally 2015), and the real-time processing requirements of mobile applications
  • Inspired by the analysis in (Fawzi, Moosavi-Dezfooli, and Frossard 2016), we propose a method to measure the effect of parameter quantization errors in individual layers on the overall model prediction accuracy
  • We propose an efficient approach to optimize layer-wise bit-width for parameter quantization
  • We show that the proposed approach is more general and accurate than previous quantization optimization approaches
  • Experimental results show that our method outperforms previous works, and achieves 20 − 40% higher compression rate than signal-toquantization-noise ratio (SQNR)-based methods and equal bitwidth quantization
Results
  • The authors show empirical results that validate the assumptions in previous sections, and evaluate the proposed bit-width optimization approach.

    All codes are implemented using MatConvNet (Vedaldi and Lenc 2015).
  • To validate the effectiveness of the proposed accuracy estimation method, the authors conduct several experiments.
  • These experiments validate the relationship between the estimated accuracy, the linearity of the measurement, and the additivity of the measurement.
  • The quantized model is tested on the validation set of Imagenet (Krizhevsky, Sutskever, and Hinton 2012), which contains 50000 images in 1000 classes
Conclusion
  • Parameter quantization is an important process to reduce the computation and memory costs of DNNs, and to deploy complex DNNs on mobile equipments.
  • The authors propose an efficient approach to optimize layer-wise bit-width for parameter quantization.
  • The authors show that the proposed approach is more general and accurate than previous quantization optimization approaches.
  • Experimental results show that the method outperforms previous works, and achieves 20 − 40% higher compression rate than SQNR-based methods and equal bitwidth quantization.
  • The authors will consider combining the method with fine-tuning and other model compression methods to achieve better model compression results
Summary
  • Introduction:

    Deep neural networks (DNNs) have achieved significant success in various machine learning applications, including image classification (Krizhevsky, Sutskever, and Hinton 2012; Simonyan and Zisserman 2014; Szegedy et al 2015), image retrieval (Hoang et al 2017; Do, Doan, and Cheung 2016), and natural language processing (Deng, Hinton, and Kingsbury 2013)
  • These achievements come with increasing computational and memory cost, as the neural networks are becoming deeper (He et al 2016), and contain more filters per single layer (Zeiler and Fergus 2014).
  • It is worth noting that model pruning and parameter quantization can be applied at the same time, without interfering with each other (Han, Mao, and Dally 2015); the authors can apply both approaches to achieve higher compression rates
  • Objectives:

    The goal of the paper is to find a way to achieve optimal quantization result to compress a DNN model.
  • Following the discussion from the optimization problem in Eq (14), the goal is to constraint Eq (20) to be a small value while minimizing the model size
  • Results:

    The authors show empirical results that validate the assumptions in previous sections, and evaluate the proposed bit-width optimization approach.

    All codes are implemented using MatConvNet (Vedaldi and Lenc 2015).
  • To validate the effectiveness of the proposed accuracy estimation method, the authors conduct several experiments.
  • These experiments validate the relationship between the estimated accuracy, the linearity of the measurement, and the additivity of the measurement.
  • The quantized model is tested on the validation set of Imagenet (Krizhevsky, Sutskever, and Hinton 2012), which contains 50000 images in 1000 classes
  • Conclusion:

    Parameter quantization is an important process to reduce the computation and memory costs of DNNs, and to deploy complex DNNs on mobile equipments.
  • The authors propose an efficient approach to optimize layer-wise bit-width for parameter quantization.
  • The authors show that the proposed approach is more general and accurate than previous quantization optimization approaches.
  • Experimental results show that the method outperforms previous works, and achieves 20 − 40% higher compression rate than SQNR-based methods and equal bitwidth quantization.
  • The authors will consider combining the method with fine-tuning and other model compression methods to achieve better model compression results
Related work
  • Parameter quantization has been widely used for DNN model compression (Gupta et al 2015; Han, Mao, and Dally 2015; Wu et al 2016). The work in (Gupta et al 2015) limits the bit-width of DNN models for both training and testing, and stochastic rounding scheme is proposed for quantization to improve the model training performance under low bit-width. The authors in (Han, Mao, and Dally 2015) use kmeans to train the quantization centroids, and use these centroids to quantize the parameters. The authors in (Wu et al 2016) separate the parameter vectors into sub-vectors, and find sub-codebook of each sub-vectors for quantization. In these works, all (or a majority of) layers are quantized with the same bit-width. However, as the layers in DNN have various structures, these layers may have different properties with respect to quantization. It is possible to achieve better compression result by optimizing quantization bit-width for each layer.
Funding
  • Our new quantization algorithm outperforms previous quantization optimization methods, and achieves 20-40% higher compression rate compared to equal bit-width quantization at the same model prediction accuracy
  • Our method constantly outperforms recent state-of-the-art, i.e., the SQNRbased method (Lin, Talathi, and Annapureddy 2016) on different models, and achieves 20-40% higher compression rate compared to equal bit-width quantization
  • We set the accuracy degradation to be roughly half of original accuracy (57%), which is 28%
  • Experimental results show that our method outperforms previous works, and achieves 20 − 40% higher compression rate than SQNR-based methods and equal bitwidth quantization
Reference
  • Anwar, S.; Hwang, K.; and Sung, W. 2015. Fixed point optimization of deep convolutional neural networks for object recognition. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1131– 1135.
    Google ScholarLocate open access versionFindings
  • Deng, L.; Hinton, G.; and Kingsbury, B. 2013. New types of deep neural network learning for speech recognition and related applications: An overview. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 8599–8603.
    Google ScholarLocate open access versionFindings
  • Do, T.-T.; Doan, A.-D.; and Cheung, N.-M. 2016. Learning to hash with binary deep neural network. In European Conference on Computer Vision (ECCV), 219–234. Springer.
    Google ScholarLocate open access versionFindings
  • Fawzi, A.; Moosavi-Dezfooli, S.-M.; and Frossard, P. 2016. Robustness of classifiers: from adversarial to random noise. In Advances in Neural Information Processing Systems (NIPS). 1632–1640.
    Google ScholarLocate open access versionFindings
  • Figurnov, M.; Ibraimova, A.; Vetrov, D. P.; and Kohli, P. 2016. Perforatedcnns: Acceleration through elimination of redundant convolutions. In Advances in Neural Information Processing Systems (NIPS), 947–955.
    Google ScholarLocate open access versionFindings
  • Gray, R. M., and Neuhoff, D. L. 200Quantization. IEEE Transactions on Information Theory (TIT) 44(6):2325–2383.
    Google ScholarLocate open access versionFindings
  • Gupta, S.; Agrawal, A.; Gopalakrishnan, K.; and Narayanan, P. 2015. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 1737–1746.
    Google ScholarLocate open access versionFindings
  • Han, S.; Liu, X.; Mao, H.; Pu, J.; Pedram, A.; Horowitz, M. A.; and Dally, W. J. 2016. Eie: efficient inference engine on compressed deep neural network. In Proceedings of the IEEE International Symposium on Computer Architecture (ISCA), 243–254.
    Google ScholarLocate open access versionFindings
  • Han, S.; Mao, H.; and Dally, W. J. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.
    Findings
  • He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 770–778.
    Google ScholarLocate open access versionFindings
  • Hinton, G.; Vinyals, O.; and Dean, J. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
    Findings
  • Hoang, T.; Do, T.-T.; Tan, D.-K. L.; and Cheung, N.-M. 2017. Selective deep convolutional features for image retrieval. arXiv preprint arXiv:1707.00809.
    Findings
  • Hwang, K., and Sung, W. 2014. Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1. In 2014 IEEE Workshop on Signal Processing Systems (SiPS), 1–6.
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS), 1097–1105.
    Google ScholarLocate open access versionFindings
  • Lin, D.; Talathi, S.; and Annapureddy, S. 2016. Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning (ICML), 2849– 2858.
    Google ScholarLocate open access versionFindings
  • Pang, T.; Du, C.; and Zhu, J. 2017. Robust deep learning via reverse cross-entropy training and thresholding test. arXiv preprint arXiv:1706.00633.
    Findings
  • Romero, A.; Ballas, N.; Kahou, S. E.; Chassang, A.; Gatta, C.; and Bengio, Y. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.
    Findings
  • Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.
    Findings
  • Sun, F.; Lin, J.; and Wang, Z. 2016. Intra-layer nonuniform quantization of convolutional neural network. In 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP), 1–5.
    Google ScholarLocate open access versionFindings
  • Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 1–9.
    Google ScholarLocate open access versionFindings
  • Vedaldi, A., and Lenc, K. 2015. Matconvnet – convolutional neural networks for matlab. In Proceeding of the ACM Int. Conf. on Multimedia.
    Google ScholarLocate open access versionFindings
  • Wu, J.; Leng, C.; Wang, Y.; Hu, Q.; and Cheng, J. 2016. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4820–4828.
    Google ScholarLocate open access versionFindings
  • You, Y. 2010. Audio Coding: Theory and Applications. Springer Science & Business Media.
    Google ScholarFindings
  • Zeiler, M. D., and Fergus, R. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision (ECCV), 818–833. Springer.
    Google ScholarLocate open access versionFindings
  • Zhou, Y.; Do, T. T.; Zheng, H.; Cheung, N. M.; and Fang, L. 2016. Computation and memory efficient image segmentation. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) PP(99):1–1.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments