AdaBits: Neural Network Quantization with Adaptive Bit-Widths

Liao Zhenyu
Liao Zhenyu

CVPR, pp. 2143-2153, 2019.

Cited by: 1|Views122
EI
Weibo:
This new kind of adaptive models widen the choices for designing dynamic models which can instantly adapt to different hardwares and resource constraints

Abstract:

Deep neural networks with adaptive configurations have gained increasing attention due to the instant and flexible deployment of these models on platforms with different resource budgets. In this paper, we investigate a novel option to achieve this goal by enabling adaptive bit-widths of weights and activations in the model. We first ex...More

Code:

Data:

0
Introduction
  • Recent development of deep learning enables application of deep neural networks across a wide range of platforms that present different resource constraints.
  • To serve applications under all these scenarios with drastically different requirements, different models tailored for different resource budgets can be devised either manually [14, 15, 16, 37] or automatically through neural architecture search [39, 56, 57]
  • This strategy is beneficial for optimal trade-offs with a fixed combination of constraints, but is not economical, because it requires time-consuming training and benchmarking for each of these models, which prohibits instant adaptation to favor different scenarios.
  • Inspired by this work, [3] integrates adaptation of depth, width and kernel size altogether, and achieves better trade-offs between performance and efficiency through progressive training. [48] adopts the same strategy with scaling up factors, but uses simultaneous training algorithm to achieve improved predictive accuracy
Highlights
  • Recent development of deep learning enables application of deep neural networks across a wide range of platforms that present different resource constraints
  • Our results prove that adaptive bit-width is an additional option for adaptive models, which is able to further improve trade-offs between efficiency and accuracy for deep neural networks
  • Since our joint training approach is general and can be combined with any quantization algorithms based on quantization-aware training, we believe similar results can be achieves by combining other quantization approaches with our AdaBits algorithm
  • We investigate the possibility to adaptively configure bit-widths for deep neural networks
  • The final AdaBits approach achieves similar accuracies as models quantized with different bitwidths individually, for a wide range of models including MobileNet V1/V2 and ResNet50 on the ImageNet dataset
  • This new kind of adaptive models widen the choices for designing dynamic models which can instantly adapt to different hardwares and resource constraints
Methods
  • The authors evaluate the AdaBits algorithm on the ImageNet classification task and compare the resulted models with those quantized individually with different bit-widths.
  • To examine the proposed methods, the authors quantize several representative models with adaptive bit-widths, including MobileNet V1/V2 and ResNet50, and evaluate them on the ImageNet dataset, using AdaBits algorithm.
  • For the first and last layers, weights are quantized with bit-width of 8 [6], while the input to the last layer is quantized with the same precision as other layers.
  • Bias in the last fully-connected layer(s) and the batch normalization layers are not quantized
Results
  • AdaBits with other quantization methods is another future work.
  • Due to numerous algorithms for neural network quantization, the authors only select a state-of-theart algorithm SAT to validate the effectiveness of adaptive bit-width.
  • Since the joint training approach is general and can be combined with any quantization algorithms based on quantization-aware training, the authors believe similar results can be achieves by combining other quantization approaches with the AdaBits algorithm
Conclusion
  • Discussion and Future

    Work

    The authors' approach for adaptive bit-width indicates that bitwidth of quantized models is an additional degree of freedom besides channel number, depth, kernel-size and resolution for adaptive models.
  • The final AdaBits approach achieves similar accuracies as models quantized with different bitwidths individually, for a wide range of models including MobileNet V1/V2 and ResNet50 on the ImageNet dataset
  • This new kind of adaptive models widen the choices for designing dynamic models which can instantly adapt to different hardwares and resource constraints
Summary
  • Introduction:

    Recent development of deep learning enables application of deep neural networks across a wide range of platforms that present different resource constraints.
  • To serve applications under all these scenarios with drastically different requirements, different models tailored for different resource budgets can be devised either manually [14, 15, 16, 37] or automatically through neural architecture search [39, 56, 57]
  • This strategy is beneficial for optimal trade-offs with a fixed combination of constraints, but is not economical, because it requires time-consuming training and benchmarking for each of these models, which prohibits instant adaptation to favor different scenarios.
  • Inspired by this work, [3] integrates adaptation of depth, width and kernel size altogether, and achieves better trade-offs between performance and efficiency through progressive training. [48] adopts the same strategy with scaling up factors, but uses simultaneous training algorithm to achieve improved predictive accuracy
  • Methods:

    The authors evaluate the AdaBits algorithm on the ImageNet classification task and compare the resulted models with those quantized individually with different bit-widths.
  • To examine the proposed methods, the authors quantize several representative models with adaptive bit-widths, including MobileNet V1/V2 and ResNet50, and evaluate them on the ImageNet dataset, using AdaBits algorithm.
  • For the first and last layers, weights are quantized with bit-width of 8 [6], while the input to the last layer is quantized with the same precision as other layers.
  • Bias in the last fully-connected layer(s) and the batch normalization layers are not quantized
  • Results:

    AdaBits with other quantization methods is another future work.
  • Due to numerous algorithms for neural network quantization, the authors only select a state-of-theart algorithm SAT to validate the effectiveness of adaptive bit-width.
  • Since the joint training approach is general and can be combined with any quantization algorithms based on quantization-aware training, the authors believe similar results can be achieves by combining other quantization approaches with the AdaBits algorithm
  • Conclusion:

    Discussion and Future

    Work

    The authors' approach for adaptive bit-width indicates that bitwidth of quantized models is an additional degree of freedom besides channel number, depth, kernel-size and resolution for adaptive models.
  • The final AdaBits approach achieves similar accuracies as models quantized with different bitwidths individually, for a wide range of models including MobileNet V1/V2 and ResNet50 on the ImageNet dataset
  • This new kind of adaptive models widen the choices for designing dynamic models which can instantly adapt to different hardwares and resource constraints
Tables
  • Table1: Direct adaptation of models trained on 2 and 4 bits on different bit-widths, with and without batch norm calibration. Results are top-1 validation accuracy (%) of ResNet50 on ImageNet
  • Table2: Results of progressive quantization with ascending/descending bit-widths of ResNet50 on ImageNet. Results are top-1 validation accuracy (%)
  • Table3: Results of Vanilla AdaBits with MobileNet V1 on ImageNet with four bit-widths. Results are top-1 validation accuracy (%)
  • Table4: Comparison between individual quantization and AdaBits quantization for top-1 validation accuracy (%) of MobileNet V1/V2 and ResNet50 on ImageNet. Note that we use two quantization schemes to compare our AdaBits with SAT baseline models where “original” denotes the original DoReFa scheme and “modified” denote the modified scheme in Eq (3) which enables producing weights for lower bit-width from the 8-bit model. “FP” denotes the full-precision models is needed to recover weights in different bit-widths
Download tables as Excel
Related work
  • Neural Network Quantization Neural network quantization has long been studied since the very beginning of the recent blooming era of deep learning, including binarization [1, 7, 8, 36], quantization [20, 51, 54] and ensemble method [55]. Initially, uniform precision quantization is adopted inside the whole network, where all layers share the same bit-width [17, 19, 28, 31, 32, 33, 46, 52]. Recent work employs neural architecture search methods for model quantization, which implements mixed-precision strategy where different bit-widths are assigned to different layers or even channels [10, 26, 41, 42, 44]. [18] analyzes the problem of efficient training for neural network quantization, and proposes a scale-adjusted training (SAT) technique, achieving state-of-the-art performance. However, the possibility of developing a single model applicable at different bit-widths is still not well-examined, and it remains unclear how to achieve this purpose.

    Neural Architecture Search Neural architecture search (NAS) gains increasing popularity in recent study [4, 21, 24, 27, 34, 43, 45, 56]. Specifically, the searching strategy is adopted in other aspects of optimizing neural networks, such as automatic tuning of various training hyper-parameters including activation function [35] and data augmentation [9]. The NAS algorithms also benefit other tasks, such as generative adversarial networks [11], object detection [5] and segmentation [23]. As mentioned above, neural architecture search method for quantization is also actively studied in recent literature. However, NAS is computationally expensive, and usually requires time-consuming re-training or finetuning. Recent work has reduced the searching time by a large extent through one-shot architecture search [2, 38]. However, the resulting models are still inflexible, prohibiting their application in adaptive scenarios. Generally, conventional NAS methods are more suitable for optimizing a single model under specific resource constraints.
Funding
  • Investigates a novel option to achieve this goal by enabling adaptive bit-widths of weights and activations in the model
  • Discovers that joint training is able to produce comparable performance on the adaptive model as individual models
  • Proposes a new technique named Switchable Clipping Level to further improve quantized models at the lowest bit-width
  • Demonstrates that bit-width of weights and activations is a new option for adaptively executable deep neural networks, offering a distinct opportunity for improved accuracy-efficiency trade-off as well as instant adaptation according to the platform constraints in real-world applications
  • Finds that an adaptive model produced by a joint quantization approach with a key treatment to the clipping level parameters is able to achieve comparable performance with individualprecision models on several bit-widths
Reference
  • Yu Bai, Yu-Xiang Wang, and Edo Liberty. Proxquant: Quantized neural networks via proximal operators. arXiv preprint arXiv:1810.00861, 2018. 2
    Findings
  • Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning, pages 549–558, 2018. 2
    Google ScholarLocate open access versionFindings
  • Han Cai, Chuang Gan, and Song Han. Once for all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791, 2019. 1, 2
    Findings
  • Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332, 2018. 2
    Findings
  • Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Chunhong Pan, and Jian Sun. Detnas: Backbone search for object detection. arXiv preprint arXiv:1903.10979, 2019. 2
    Findings
  • Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085, 2018. 2, 3, 6, 7
    Findings
  • Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems, pages 3123–3131, 2015. 2
    Google ScholarLocate open access versionFindings
  • Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran ElYaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016. 2
    Findings
  • Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018. 2
    Findings
  • Ahmed T. Elthakeb, Prannoy Pilligundla, FatemehSadat Mireshghallah, Amir Yazdanbakhsh, Sicun Gao, and Hadi Esmaeilzadeh. Releq: an automatic reinforcement learning approach for deep quantization of neural networks. arXiv preprint arXiv:1811.01704, 2018. 2, 8
    Findings
  • Xinyu Gong, Shiyu Chang, Yifan Jiang, and Zhangyang Wang. Autogan: Neural architecture search for generative adversarial networks. arXiv preprint arXiv:1908.03835, 2019. 2
    Findings
  • Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017. 7
    Findings
  • Sorin Grigorescu, Bogdan Trasnea, Tiberiu Cocias, and Gigel Macesanu. A survey of deep learning techniques for autonomous driving. Journal of Field Robotics, 2019. 1
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 1
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016. 1
    Google ScholarLocate open access versionFindings
  • Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017. 1
    Findings
  • Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2704–2713, 2018. 2
    Google ScholarLocate open access versionFindings
  • Qing Jin, Linjie Yang, and Zhenyu Liao. Towards efficient training for neural network quantization. arXiv preprint arXiv:1912.10207, 2019. 1, 2, 3, 4, 5, 7, 8
    Findings
  • Cong Leng, Zesheng Dou, Hao Li, Shenghuo Zhu, and Rong Jin. Extremely low bit neural network: Squeeze the last bit out with admm. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. 2
    Google ScholarLocate open access versionFindings
  • Fengfu Li, Bo Zhang, and Bin Liu. Ternary weight networks. arXiv preprint arXiv:1605.04711, 2016. 2
    Findings
  • Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, and Alan Yuille. Autonl: Neural architecture search for lightweight non-local networks in mobile vision. In submission, 2020. 2
    Google ScholarFindings
  • Yingwei Li, Zhuotun Zhu, Yuyin Zhou, Yingda Xia, Wei Shen, Elliot K. Fishman, and Alan L. Yuille. Volumetric Medical Image Segmentation: A 3D Deep Coarse-to-Fine Framework and Its Adversarial Examples, pages 69–91. Springer International Publishing, Cham, 2019. 1
    Google ScholarFindings
  • Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan L Yuille, and Li Fei-Fei. Autodeeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 82–92, 2019. 2
    Google ScholarLocate open access versionFindings
  • Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34, 2018. 2
    Google ScholarLocate open access versionFindings
  • Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016. 7
    Findings
  • Qian Lou, Lantao Liu, Minje Kim, and Lei Jiang. Autoqb: Automl for network quantization and binarization on mobile devices. arXiv preprint arXiv:1902.05690, 2019. 2, 8
    Findings
  • Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Alan Yuille, and Jianchao Yang. Atom{nas}: Finegrained end-to-end neural architecture search. In International Conference on Learning Representations, 2020. 2
    Google ScholarLocate open access versionFindings
  • Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, and Pradeep Dubey. Ternary neural networks with fine-grained quantization. arXiv preprint arXiv:1705.01462, 2017. 2
    Findings
  • Xin Miao, Xin Yuan, Yunchen Pu, and Vassilis Athitsos. λ-net: Reconstruct hyperspectral images from a snapshot measurement. In IEEE/CVF Conference on Computer Vision (ICCV), volume 1, 2019. 8
    Google ScholarLocate open access versionFindings
  • Xin Miao, Xiantong Zhen, Xianglong Liu, Cheng Deng, Vassilis Athitsos, and Heng Huang. Direct shape regression networks for end-to-end face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5040–5049, 2018. 8
    Google ScholarLocate open access versionFindings
  • Asit Mishra and Debbie Marr. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. arXiv preprint arXiv:1711.05852, 2017. 2
    Findings
  • Asit Mishra, Eriko Nurvitadhi, Jeffrey J Cook, and Debbie Marr. Wrpn: wide reduced-precision networks. arXiv preprint arXiv:1709.01134, 2017. 2
    Findings
  • Eunhyeok Park, Junwhan Ahn, and Sungjoo Yoo. Weightedentropy-based quantization for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5456–5464, 2017. 2
    Google ScholarLocate open access versionFindings
  • Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018. 2
    Findings
  • Prajit Ramachandran, Barret Zoph, and Quoc V. Le. Searching for activation functions, 2018. 2
    Google ScholarFindings
  • Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision, pages 525–542. Springer, 2016. 2
    Google ScholarLocate open access versionFindings
  • Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018. 1
    Google ScholarLocate open access versionFindings
  • Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. Single-path nas: Designing hardware-efficient convnets in less than 4 hours. arXiv preprint arXiv:1904.02877, 2019. 2
    Findings
  • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2820–2828, 2019. 1
    Google ScholarLocate open access versionFindings
  • Alex Teichman and Sebastian Thrun. Practical object recognition in autonomous driving and beyond. In Advanced Robotics and its Social Impacts, pages 35–38. IEEE, 2011. 1
    Google ScholarLocate open access versionFindings
  • Stefan Uhlich, Lukas Mauch, Kazuki Yoshiyama, Fabien Cardinaux, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, and Akira Nakamura. Differentiable quantization of deep neural networks. arXiv preprint arXiv:1905.11452, 2019. 2, 8
    Findings
  • Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8612–8620, 2019. 2, 8
    Google ScholarLocate open access versionFindings
  • Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10734–10742, 2019. 2
    Google ScholarLocate open access versionFindings
  • Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda, and Kurt Keutzer. Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090, 2018. 2, 8
    Findings
  • Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. Snas: stochastic neural architecture search. arXiv preprint arXiv:1812.09926, 2018. 2
    Findings
  • Chen Xu, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong Wang, and Hongbin Zha. Alternating multi-bit quantization for recurrent neural networks. arXiv preprint arXiv:1802.00150, 2018. 2
    Findings
  • Jiahui Yu and Thomas S. Huang. Network slimming by slimmable networks: Towards one-shot architecture search for channel numbers. CoRR, abs/1903.11728, 2019. 4, 8
    Findings
  • Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender Pieter-Jan Kindermans, Mingxing Tan, Thomas S. Huang, Xiaodan Song, and Quoc V Le. Scaling up neural architecture search with big single-stage models. In submission, 2020. 1, 2
    Google ScholarLocate open access versionFindings
  • Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas Huang. Slimmable neural networks. arXiv preprint arXiv:1812.08928, 2018. 1, 2, 5
    Findings
  • Lei Yue, Xin Miao, Pengbo Wang, Baochang Zhang, Xiantong Zhen, and Xianbin Cao. Attentional alignment networks. In BMVC, volume 2, page 7, 2018. 8
    Google ScholarLocate open access versionFindings
  • Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016. 2, 3
    Findings
  • Shu-Chang Zhou, Yu-Zhi Wang, He Wen, Qin-Yao He, and Yu-Heng Zou. Balanced quantization: An effective and efficient approach to quantized neural networks. Journal of Computer Science and Technology, 32(4):667–682, 2017. 2
    Google ScholarLocate open access versionFindings
  • Yuyin Zhou, Yingwei Li, Zhishuai Zhang, Yan Wang, Angtian Wang, Elliot K Fishman, Alan L Yuille, and Seyoun Park. Hyper-pairing network for multi-phase pancreatic ductal adenocarcinoma segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 155–163. Springer, 2019. 1
    Google ScholarLocate open access versionFindings
  • Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. Trained ternary quantization. arXiv preprint arXiv:1612.01064, 2016. 2
    Findings
  • Shilin Zhu, Xin Dong, and Hao Su. Binary ensemble neural network: More bits per network or more networks per bit? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4923–4932, 2019. 2
    Google ScholarLocate open access versionFindings
  • Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016. 1, 2
    Findings
  • Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018. 1
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments