Resolution Adaptive Networks for Efficient Inference
CVPR, pp. 2366-2375, 2020.
EI
Weibo:
Abstract:
Recently, adaptive inference is gaining increasing attention due to its high computational efficiency. Different from existing works, which mainly exploit architecture redundancy for adaptive network design, in this paper, we focus on spatial redundancy of input samples, and propose a novel Resolution Adaptive Network (RANet). Our motiv...More
Code:
Data:
Introduction
- Advances in computer hardware have enabled the training of very deep convolutional neural networks (CNNs), such as ResNet [8] and DenseNet [14], the high computational cost of deep CNNs is still unaffordable in many applications.
- It has been shown that the intrinsic classification difficulty for different samples varies drastically: some of them can be correctly classified by smaller models with fewer layers or channels, while some may need larger networks [24, 37, 12, 36]
- By exploiting this fact, many works have been proposed recently.
- Multi-Scale Dense Network (MSDNet) [12] allows some samples to exit at some auxiliary classifiers conditioned on their prediction confidence
Highlights
- Advances in computer hardware have enabled the training of very deep convolutional neural networks (CNNs), such as ResNet [8] and DenseNet [14], the high computational cost of deep convolutional neural networks is still unaffordable in many applications
- As spatial redundancy of input images has been certificated in recent work [4], this paper proposes a novel adaptive learning model which exploits both structural redundancy of a neural network and spatial redundancy of input samples
- Multi-Scale Dense Network substantially outperforms other baseline models, and Resolution Adaptive Network are superior to Multi-Scale Dense Network, especially when the computational budget is low
- On CIFAR-10 (CIFAR-100), the accuracies of different classifiers for Resolution Adaptive Network are over 1% (2% − 5%) higher than those of Multi-Scale Dense Network when the computational budget ranges from 0.1 × 108 to 0.5 × 108 FLOPs
- The accuracies of Resolution Adaptive Network are 2% and 5% higher than those of Multi-Scale Dense Network on CIFAR-10 and CIFAR-100, respectively
- We proposed a novel resolution adaptive neural network based on a multi-scale dense connection architecture, which we refer to as Resolution Adaptive Network
Methods
- The authors first introduce the idea of adaptive inference, the authors demonstrate the overall architecture and the network details of the proposed RANet.
3.1. - The authors use the highest confidence of the softmax output as the decision basis, which means that the final output will be the prediction of the first classifier whose largest softmax output is greater than a given threshold.
- This can be represented by k∗ = min k| max c pkc ≥
Results
- The authors report classification accuracies of all individual classifiers in the model and other baselines.
- MSDNet substantially outperforms other baseline models, and RANet are superior to MSDNet, especially when the computational budget is low.
- The authors plot the classification accuracy of each MSDNet and RANet in a gray and a light-yellow curve, respectively.
- The results on the two CIFAR datasets show that RANets consistently outperform MSDNets and other baseline models across all budgets.
- Even though the model and MSDNet show close performance on CIFAR-10 when the computational budget ranges from 0.2 × 108 to 0.3 × 108, the classification accuracies of RANets are consistently higher than ( 1%) these of MSDNets on CIFAR-
Conclusion
- The authors proposed a novel resolution adaptive neural network based on a multi-scale dense connection architecture, which the authors refer to as RANet.
- Samples with high prediction confidence will exit early from the network and larger scale features with finer details will only be further utilized for those non-typical images which achieve unreliable predictions in previous sub-networks.
- This resolution adaptation mechanism and the depth adaptation in each sub-network of RANet guarantee its high computational efficiency.
- On three image classification benchmarks, the experiments demonstrate the effectiveness of the proposed RANet in both the anytime prediction setting and the budgeted batch classification setting
Summary
Introduction:
Advances in computer hardware have enabled the training of very deep convolutional neural networks (CNNs), such as ResNet [8] and DenseNet [14], the high computational cost of deep CNNs is still unaffordable in many applications.- It has been shown that the intrinsic classification difficulty for different samples varies drastically: some of them can be correctly classified by smaller models with fewer layers or channels, while some may need larger networks [24, 37, 12, 36]
- By exploiting this fact, many works have been proposed recently.
- Multi-Scale Dense Network (MSDNet) [12] allows some samples to exit at some auxiliary classifiers conditioned on their prediction confidence
Methods:
The authors first introduce the idea of adaptive inference, the authors demonstrate the overall architecture and the network details of the proposed RANet.
3.1.- The authors use the highest confidence of the softmax output as the decision basis, which means that the final output will be the prediction of the first classifier whose largest softmax output is greater than a given threshold.
- This can be represented by k∗ = min k| max c pkc ≥
Results:
The authors report classification accuracies of all individual classifiers in the model and other baselines.- MSDNet substantially outperforms other baseline models, and RANet are superior to MSDNet, especially when the computational budget is low.
- The authors plot the classification accuracy of each MSDNet and RANet in a gray and a light-yellow curve, respectively.
- The results on the two CIFAR datasets show that RANets consistently outperform MSDNets and other baseline models across all budgets.
- Even though the model and MSDNet show close performance on CIFAR-10 when the computational budget ranges from 0.2 × 108 to 0.3 × 108, the classification accuracies of RANets are consistently higher than ( 1%) these of MSDNets on CIFAR-
Conclusion:
The authors proposed a novel resolution adaptive neural network based on a multi-scale dense connection architecture, which the authors refer to as RANet.- Samples with high prediction confidence will exit early from the network and larger scale features with finer details will only be further utilized for those non-typical images which achieve unreliable predictions in previous sub-networks.
- This resolution adaptation mechanism and the depth adaptation in each sub-network of RANet guarantee its high computational efficiency.
- On three image classification benchmarks, the experiments demonstrate the effectiveness of the proposed RANet in both the anytime prediction setting and the budgeted batch classification setting
Related work
- Efficient inference for deep networks. Many previous works explore variants of deep networks to speed up the network inference. One direct solution is designing lightweight models, e.g., MobileNet [10, 31], ShuffleNet [42, 27] and CondenseNet [13]. Other lines of research focus on pruning redundant network connections [20, 22, 26], or quantizing network weights [15, 29, 17]. Moreover, knowledge distilling [9] is proposed to train a small (student) network which mimics outputs of a deeper and/or wider (teacher) network.
The aforementioned approaches can be seen as static model acceleration techniques, which infer all input samples with a whole network consistently. In contrast, adaptive networks can strategically allocate appropriate computational resources for classifying input images based on input complexity. This research direction is gaining increasing attention in recent years due to its advantages. The most intuitive implementation is ensembling multiple models and selectively executing a subset of the models in a cascading [2] or mixing way [32, 30]. Recent works also propose to adaptively skip layers or blocks [7, 37, 39, 40], or dynamically select channels [24, 3, 1] during inference time. Auxiliary predictors can also be attached at different locations of a deep network to allow early exiting “easy” examples [35, 12, 11, 23]. Furthermore, dynamically activating parts of network branches with multi-branch structure [36] also provide an alternate way for adaptive inference.
Funding
- This work is supported by grants from the Institute for Guo Qiang of Tsinghua University, National Natural Science Foundation of China (No 61906106) and Beijing Academy of Artificial Intelligence (BAAI)
Reference
- Babak Ehteshami Bejnordi, Tijmen Blankevoort, and Max Welling. Batch-shaped channel gated networks. CoRR, abs/1907.06627, 2019. 2
- Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. Adaptive neural networks for efficient inference. In ICML, 2017. 2
- Shaofeng Cai, Gang Chen, Beng Chin Ooi, and Jinyang Gao. Model slicing for supporting complex analytics with elastic inference cost and resource constraints. In PVLDB, 2019. 2
- Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, and Jiashi Feng. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In ICCV, 2019. 2, 3, 8
- Ting-Wu Chin, Ruizhou Ding, and Diana Marculescu. Adascale: Towards real-time video object detection using adaptive scaling. In Systems and Machine Learning Conference, 2019. 3
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 5
- Michael Figurnov, Maxwell D Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov. Spatially adaptive computation time for residual networks. In CVPR, 2012
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. 1, 2, 6
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In NeurlIPS (Workshop), 2015. 2
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017. 1, 2, 3
- Hanzhang Hu, Debadeepta Dey, Martial Hebert, and J Andrew Bagnell. Learning anytime predictions in neural networks via adaptive loss balancing. In AAAI, 2019. 2
- Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q Weinberger. Multi-scale dense networks for resource efficient image classification. In ICLR, 2018. 1, 2, 3, 5, 6, 8
- Gao Huang, Shichen Liu, Laurens Van der Maaten, and Kilian Q Weinberger. Condensenet: An efficient densenet using learned group convolutions. In CVPR, 2018. 1, 2
- Gao Huang, Zhuang Liu, Geoff Pleiss, Laurens Van Der Maaten, and Kilian Weinberger. Convolutional networks with dense connectivity. IEEE trans. on PAMI, 2019. 1, 2, 4, 5
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran ElYaniv, and Yoshua Bengio. Binarized neural networks. In NeurlIPS, 2016. 1, 2
- Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015. 4
- Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In CVPR, 2018. 1, 2
- Tsung-Wei Ke, Michael Maire, and Stella X Yu. Multigrid neural architectures. In CVPR, 2017. 2
- Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009. 5
- Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In NeurlIPS, 1990. 1, 2
- Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu. Deeply-supervised nets. In AISTATS, 2015. 6
- Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. In ICLR, 2017. 1, 2
- Hao Li, Hong Zhang, Xiaojuan Qi, Ruigang Yang, and Gao Huang. Improved techniques for training adaptive deep networks. In ICCV, 2019. 2, 6
- Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. Runtime neural pruning. In NeurlIPS, 2017. 1, 2
- Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017. 2
- Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. In ICCV, 2017. 1, 2
- Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV, 2018. 2
- Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages 807– 814, 2010. 4
- Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV, 2016. 1, 2
- Adria Ruiz and Jakob Verbeek. Adaptative inference cost with convolutional neural mixture models. In ICCV, 2019. 2
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018. 1, 2
- Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V. Le, Geoffrey E. Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In ICLR, 2017. 2
- Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. Deep high-resolution representation learning for human pose estimation. In CVPR, 2019. 2
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In CVPR, 2015. 6
- Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. Branchynet: Fast inference via early exiting from deep neural networks. In ICPR, 2016. 2
- Ravi Teja Mullapudi, William R Mark, Noam Shazeer, and Kayvon Fatahalian. Hydranets: Specialized dynamic architectures for efficient inference. In CVPR, 2018. 1, 2
- Andreas Veit and Serge Belongie. Convolutional networks with adaptive inference graphs. In ECCV, 2018. 1, 2
- Tom Veniat and Ludovic Denoyer. Learning time/memoryefficient deep architectures with budgeted super networks. In CVPR, 2018. 2
- Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E Gonzalez. Skipnet: Learning dynamic routing in convolutional networks. In ECCV, 2018. 2
- Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S Davis, Kristen Grauman, and Rogerio Feris. Blockdrop: Dynamic inference paths in residual networks. In CVPR, 2018. 2
- Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In BMVC, 2016. 6
- Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR, 2018. 1, 2
- Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. Icnet for real-time semantic segmentation on high-resolution images. In ECCV, 2018. 2 Supplementary Materials for: Resolution Adaptive Networks for Efficient Inference arXiv:2003.07326v5 [cs.CV] 18 May 2020
- 1. Appendix A. Implementation Details
- 2. Appendix B. Improved Techniques
- [1] Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q Weinberger. Multi-scale dense networks for resource efficient image classification. In ICLR, 2018. 3
- [2] Hao Li, Hong Zhang, Xiaojuan Qi, Ruigang Yang, and Gao Huang. Improved techniques for training adaptive deep networks. In ICCV, 2019. 3
Full Text
Tags
Comments