AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Based on a series of observations, we find that the original giant formula in EfficientNet is unsuitable for generating smaller neural architectures

Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets

NIPS 2020, (2020): 19353-19364

Cited by: 0|Views141
EI
Full Text
Bibtex
Weibo

Abstract

To obtain excellent deep neural architectures, a series of techniques are carefully designed in EfficientNets. The giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks. So that we can find networks with high efficiency and excellent performance by twisting the three di...More

Code:

Data:

0
Introduction
  • Deep convolutional neural networks (CNNs) have achieved great success in many visual tasks, such as image recognition [22, 14, 10], object detection [37, 29, 9], and super-resolution [21, 47, 41].
  • ResNet [14] provides models with different number of layers, and MobileNet [17, 39] changes the number of channels and image resolution for different FLOPs. Most of existing works only scale one of the three dimensions – resolution, depth, and width.
Highlights
  • Deep convolutional neural networks (CNNs) have achieved great success in many visual tasks, such as image recognition [22, 14, 10], object detection [37, 29, 9], and super-resolution [21, 47, 41]
  • We study the model Rubik’s cube for shrinking deep neural networks
  • Based on a series of observations, we find that the original giant formula in EfficientNet is unsuitable for generating smaller neural architectures
  • We explore a series of TinyNets by utilizing the tiny formula to twist the three dimensions
  • The tiny formula in this work is summarized according to the observation on smaller models
  • TinyNet-E achieves a 59.9% Top-1 accuracy with only 24M FLOPs, being 1.9% higher than the previous best MobileNetV3 with similar FLOPs
  • These smaller models can be further enlarged to obtain higher performance with some new rules beyond the giant formula in EfficientNets, which will be investigated in future works
Methods
  • The authors apply our tiny formula for model Rubik’s cube to shrink EfficientNet-B0 and ResNet-50.
  • 500 training images are randomly sampled for each class, and the corresponding 5,000 images are used as validation set.
  • The tiny formula obtained on ImageNet-100 can be well transferred to other datasets as demonstrated in NAS literature [62, 28, 50].
  • The authors evaluate the tiny formula on the large-scale ImageNet-1000 dataset to verify its generalization.
  • Several competitive NAS-based models are included.
  • The authors' TinyNet-E achieves 59.9% Top-1 accuracy with 24M FLOPs, being 1.9% higher than the previous best MobileNetV3 Small 0.5× [16] with similar computational cost
Results
  • As stated in the above sections, the authors randomly sample a number of models with different resolution, depth and width.
  • Resolution, depth or width is randomly sampled from the range of 0.35 ≤ r ≤ 2.8, 0.35 ≤ d ≤ 2.8 and 0.35 ≤ w ≤ 2.8.
  • A sampled model with 318M FLOPs achieves 79.7% accuracy while EfficientNet-B0 with 387M FLOPs only achieves 78.8%
  • These observations indicate the necessity to design a more effective model shrinking method
Conclusion
  • The authors study the model Rubik’s cube for shrinking deep neural networks.
  • Based on a series of observations, the authors find that the original giant formula in EfficientNet is unsuitable for generating smaller neural architectures.
  • To this end, the authors thoroughly analyze the importance of resolution, depth and width w.r.t. the performance of portable deep networks.
  • These smaller models can be further enlarged to obtain higher performance with some new rules beyond the giant formula in EfficientNets, which will be investigated in future works
Summary
  • Introduction:

    Deep convolutional neural networks (CNNs) have achieved great success in many visual tasks, such as image recognition [22, 14, 10], object detection [37, 29, 9], and super-resolution [21, 47, 41].
  • ResNet [14] provides models with different number of layers, and MobileNet [17, 39] changes the number of channels and image resolution for different FLOPs. Most of existing works only scale one of the three dimensions – resolution, depth, and width.
  • Objectives:

    This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs.
  • For a given arbitrary baseline neural network, and with a FLOPs constraint of c · C0, where 0 < c < 1 is the reduction factor, the goal is to provide the optimal values of the three dimensions (r, d, w) for shrinking the model.
  • For the given requirement of FLOPs c · C0, the goal is to calculate the optimal combinations of (r, d, w) for building models with high performance
  • Methods:

    The authors apply our tiny formula for model Rubik’s cube to shrink EfficientNet-B0 and ResNet-50.
  • 500 training images are randomly sampled for each class, and the corresponding 5,000 images are used as validation set.
  • The tiny formula obtained on ImageNet-100 can be well transferred to other datasets as demonstrated in NAS literature [62, 28, 50].
  • The authors evaluate the tiny formula on the large-scale ImageNet-1000 dataset to verify its generalization.
  • Several competitive NAS-based models are included.
  • The authors' TinyNet-E achieves 59.9% Top-1 accuracy with 24M FLOPs, being 1.9% higher than the previous best MobileNetV3 Small 0.5× [16] with similar computational cost
  • Results:

    As stated in the above sections, the authors randomly sample a number of models with different resolution, depth and width.
  • Resolution, depth or width is randomly sampled from the range of 0.35 ≤ r ≤ 2.8, 0.35 ≤ d ≤ 2.8 and 0.35 ≤ w ≤ 2.8.
  • A sampled model with 318M FLOPs achieves 79.7% accuracy while EfficientNet-B0 with 387M FLOPs only achieves 78.8%
  • These observations indicate the necessity to design a more effective model shrinking method
  • Conclusion:

    The authors study the model Rubik’s cube for shrinking deep neural networks.
  • Based on a series of observations, the authors find that the original giant formula in EfficientNet is unsuitable for generating smaller neural architectures.
  • To this end, the authors thoroughly analyze the importance of resolution, depth and width w.r.t. the performance of portable deep networks.
  • These smaller models can be further enlarged to obtain higher performance with some new rules beyond the giant formula in EfficientNets, which will be investigated in future works
Tables
  • Table1: TinyNet Performance on ImageNet-100. All the models are shrunken from the EfficientNetB0 baseline. †Shrinking B0 to the minimum depth results in 173M FLOPs (>100M)
  • Table2: Performance of shrunken ResNet on
  • Table3: Comparison of state-of-the-art small networks over classification accuracy, the number of weights and FLOPs on ImageNet-1000 dataset. “-” mean no reported results available
  • Table4: Inference latency comparison
  • Table5: Results on MS COCO dataset
  • Table6: GhostNet results on ImageNet dataset
  • Table7: GhostNet-A architecture. #exp means expansion ratio. #out means the number of output channels. SE denotes whether using SE module (reduction ratio 10). #repeat denotes repeat times
Download tables as Excel
Related work
  • Here we revisit the existing model compression methods for shrinking neural networks, and discuss about resolution, depth and width of CNNs.

    Model Compression. Model compression aims to reduce the computation, energy and storage cost, which can be categorized into four main parts: pruning, low-bit quantization, low-rank factorization and knowledge distillation. Pruning [13, 24, 40, 26, 46, 48, 25] is used to reduce the redundant parameters in neural networks that are insensitive to the model performance. For example, [24] uses 1-norm to calculate the importance of each filter and prunes the unimportant ones accordingly. ThiNet [31] prunes filters based on statistics computed from their next layers. Low-bit quantization [18, 61, 36, 30, 8, 12, 19] represents weights or activations in neural networks using low-bit values. DorefaNet [61] trains neural networks with both low-bit weights and activations. BinaryNet [18] and XNORNet [36] quantize each neuron into only 1-bit and learn the binary weights or activations directly during the model training. Low-rank factorization methods try to estimate the informative parameters using matrix/tensor decomposition [20, 6, 57, 54, 58]. Low-rank factorization achieves some advances in model compression, but it involves complex decomposition operations and is thus computationally expensive. Knowledge distillation [15, 38, 56, 52] attempts to teach a compact model, also called student model, with knowledge distilled from a large teacher network. The common part of these compression methods is that their performance is usually upper bounded by the given pretrained models.
Funding
  • Our TinyNet-E achieves a 59.9% Top-1 accuracy with only 24M FLOPs, which is about 1.9% higher than that of the previous best MobileNetV3 with similar computational cost
  • FLOPs of these models are less than or equal to that of the baseline. It can be found in Figure 1, the performance of best models is about 2.5% higher than that of models obtained using the inversed giant formula of EfficientNet (green line) with different FLOPs
  • TinyNet-E achieves a 59.9% Top-1 accuracy with only 24M FLOPs, being 1.9% higher than the previous best MobileNetV3 with similar FLOPs
  • Our TinyNet-E achieves 59.9% Top-1 accuracy with 24M FLOPs, being 1.9% higher than the previous best MobileNetV3 Small 0.5× [16] with similar computational cost
  • TinyNet-A + RA achieves 77.7% Top-1 accuracy which is 0.9% higher than vanilla TinyNet-A
Reference
  • Mindspore. https://www.mindspore.cn/, 2020.
    Findings
  • Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR Workshops, 2020.
    Google ScholarLocate open access versionFindings
  • Kalyanmoy Deb and Himanshu Jain. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints. IEEE transactions on evolutionary computation, 18(4):577–601, 2013.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255.
    Google ScholarLocate open access versionFindings
  • Emily L Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. In NeurIPS, pages 1269–1277, 2014.
    Google ScholarLocate open access versionFindings
  • Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
    Findings
  • Jiaxin Gu, Ce Li, Baochang Zhang, Jungong Han, Xianbin Cao, Jianzhuang Liu, and David Doermann. Projection convolutional neural networks for 1-bit cnns via discrete back propagation. In AAAI, 2019.
    Google ScholarLocate open access versionFindings
  • Jianyuan Guo, Kai Han, Yunhe Wang, Chao Zhang, Zhaohui Yang, Han Wu, Xinghao Chen, and Chang Xu. Hit-detector: Hierarchical trinity architecture search for object detection. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Kai Han, Jianyuan Guo, Chao Zhang, and Mingjian Zhu. Attribute-aware attention model for fine-grained representation learning. In ACM MM, 2018.
    Google ScholarLocate open access versionFindings
  • Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. Ghostnet: More features from cheap operations. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Kai Han, Yunhe Wang, Yixing Xu, Chunjing Xu, Enhua Wu, and Chang Xu. Training binary neural networks through learning with noisy supervision. In ICML, 2020.
    Google ScholarLocate open access versionFindings
  • Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
    Findings
  • Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
    Findings
  • Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. In NeurIPS, pages 4107–4115, 2016.
    Google ScholarLocate open access versionFindings
  • Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integerarithmetic-only inference. In CVPR, pages 2704–2713, 2018.
    Google ScholarLocate open access versionFindings
  • Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Speeding up convolutional neural networks with low rank expansions. In BMVC, 2014.
    Google ScholarLocate open access versionFindings
  • Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NeurIPS, pages 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, and Ling Shao. Hrank: Filter pruning using high-rank feature map. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, and David Doermann. Towards optimal structured cnn pruning via generative adversarial learning. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV. Springer, 2014.
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In ECCV, pages 21–37.
    Google ScholarLocate open access versionFindings
  • Zechun Liu, Baoyuan Wu, Wenhan Luo, Xin Yang, Wei Liu, and Kwang-Ting Cheng. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A filter level pruning method for deep neural network compression. In ICCV, pages 5058–5066, 2017.
    Google ScholarLocate open access versionFindings
  • Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • William S Meisel. Tradeoff decision in multiple criteria decision making. Multiple criteria decision making, pages 461–476, 1973.
    Google ScholarFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Carl Edward Rasmussen and Christopher KI Williams. Gaussian processes for machine learning (adaptive computation and machine learning). 2005.
    Google ScholarFindings
  • Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV, pages 525–542.
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NeurIPS, 2015.
    Google ScholarLocate open access versionFindings
  • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, pages 4510–4520, 2018.
    Google ScholarLocate open access versionFindings
  • Han Shu, Yunhe Wang, Xu Jia, Kai Han, Hanting Chen, Chunjing Xu, Qi Tian, and Chang Xu. Coevolutionary compression for unpaired image translation. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Dehua Song, Chang Xu, Xu Jia, Yiyi Chen, Chunjing Xu, and Yunhe Wang. Efficient residual dense block search for image super-resolution. In AAAI, 2020.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inceptionresnet and the impact of residual connections on learning. In AAAI, 2017.
    Google ScholarLocate open access versionFindings
  • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In CVPR, pages 2820–2828, 2019.
    Google ScholarLocate open access versionFindings
  • Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Yehui Tang, Yunhe Wang, Yixing Xu, Hanting Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, and Chang Xu. A semi-supervised assessor of neural architectures. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Yehui Tang, Shan You, Chang Xu, Jin Han, Chen Qian, Boxin Shi, Chao Xu, and Changshui Zhang. Reborn filters: Pruning convolutional neural networks with limited data. In AAAI, 2020.
    Google ScholarFindings
  • Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-Hsuan Yang, and Lei Zhang. Ntire 2017 challenge on single image super-resolution: Methods and results. In CVPR workshops, pages 114–125, 2017.
    Google ScholarLocate open access versionFindings
  • Haotao Wang, Shupeng Gui, Haichuan Yang, Ji Liu, and Zhangyang Wang. Gan slimming: All-in-one gan compression by a unified optimization framework. In ECCV, 2020.
    Google ScholarLocate open access versionFindings
  • Richard C. Wilson, Edwin R. Hancock, and William A. P. Smith. Wide residual networks. In BMVC, 2016.
    Google ScholarLocate open access versionFindings
  • Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In CVPR, pages 10734–10742, 2019.
    Google ScholarLocate open access versionFindings
  • Qizhe Xie, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. Self-training with noisy student improves imagenet classification. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Yixing Xu, Yunhe Wang, Hanting Chen, Kai Han, XU Chunjing, Dacheng Tao, and Chang Xu. Positiveunlabeled compression on the cloud. In NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Zhaohui Yang, Yunhe Wang, Xinghao Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, and Chang Xu. Cars: Continuous evolution for efficient neural architecture search. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Zhaohui Yang, Yunhe Wang, Chuanjian Liu, Hanting Chen, Chunjing Xu, Boxin Shi, Chao Xu, and Chang Xu. Legonet: Efficient convolutional neural networks with lego filters. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • Shan You, Tao Huang, Mingmin Yang, Fei Wang, Chen Qian, and Changshui Zhang. Greedynas: Towards fast one-shot nas with greedy supernet. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Shan You, Chang Xu, Chao Xu, and Dacheng Tao. Learning from multiple teacher networks. In SIGKDD, 2017.
    Google ScholarLocate open access versionFindings
  • Xiyu Yu, Tongliang Liu, Xinchao Wang, and Dacheng Tao. On compressing deep models by low rank and sparse decomposition. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Qiulin Zhang, Zhuqing Jiang, Qishuo Lu, Jia’nan Han, Zhengxin Zeng, Shang-Hua Gao, and Aidong Men. Split to be slim: An overlooked redundancy in vanilla convolution. In IJCAI, 2020.
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
    Findings
  • Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
Author
Qiulin Zhang
Qiulin Zhang
Wei Zhang
Wei Zhang
Tong Zhang
Tong Zhang
Your rating :
0

 

Tags
Comments
小科