AtomNAS: Fine-Grained End-to-End Neural Architecture Search

ICLR, (2020)

Cited by: 37|Views357
EI
Weibo:
For efficiently exploring the huge fine-grained search space, we propose an end-to-end framework named AtomNAS, which conducts architecture search and network training jointly

Abstract:

Search space design is very critical to neural architecture search (NAS) algorithms. We propose a fine-grained search space comprised of atomic blocks, a minimal search unit that is much smaller than the ones used in recent NAS algorithms. This search space allows a mix of operations by composing different types of atomic blocks, while th...More
ZH
Full Text
Bibtex
Weibo
Introduction
  • Neural Architecture Search (NAS) has become the mainstream approach to discover efficient and powerful network structures (Zoph & Le (2017); Pham et al (2018); Tan et al (2019); Liu et al (2019a)).
  • Designing of search spaces is critical for NAS algorithms and different choices have been explored.
  • The proposed search spaces generally have only a small set of choices for each block.
  • The searched building block in a supernet should be as small as possible to generate the most diversified model structures
Highlights
  • Human-designed neural networks are already surpassed by machine-designed ones
  • Designing of search spaces is critical for Neural Architecture Search algorithms and different choices have been explored
  • For the efficient exploration of the new search space, we propose a Neural Architecture Search framework named AtomNAS which applies network pruning techniques to architecture search
  • We propose a dynamic network shrinkage technique which removes those atomic blocks on the fly and greatly reduces the run time of AtomNAS
  • We revisit the common structure, i.e., two convolutions joined by a channel-wise operation, and reformulate it as an ensemble of atomic blocks
  • For efficiently exploring the huge fine-grained search space, we propose an end-to-end framework named AtomNAS, which conducts architecture search and network training jointly
Methods
  • EXPERIMENTS ON IMAGENET

    The authors apply AtomNAS to search high performance light-weight model on ImageNet 2012 classification task (Deng et al, 2009).
  • The authors first pretrain AtomNAS models (without Swish activation function (Ramachandran et al, 2018) and Squeeze-and-Excitation (SE) module (Hu et al, 2018)) on ImageNet, use them as drop-in replacements for the backbone in the Mask-RCNN model (He et al, 2017a) by building the detection head on top of the last feature map, and finetune the model on COCO dataset.
Results
  • The authors' method achieves state-of-the-art performance under several FLOPs configurations on ImageNet with a small searching cost.
  • The authors' method achieves 75.9% top-1 accuracy on ImageNet dataset around 360M FLOPs, which is 0.9% higher than state-of-the-art model (Stamoulis et al, 2019a).
  • With the proposed search space and AtomNAS, the authors achieve state-of-the-art performance on ImageNet dataset under mobile setting.
  • With models directly produced by AtomNAS, the method achieves the new state-of-the-art under all FLOPs constraints.
Conclusion
  • The authors revisit the common structure, i.e., two convolutions joined by a channel-wise operation, and reformulate it as an ensemble of atomic blocks
  • This perspective enables a much larger and more fine-grained search space.
  • For efficiently exploring the huge fine-grained search space, the authors propose an end-to-end framework named AtomNAS, which conducts architecture search and network training jointly.
  • The searched networks achieve significantly better accuracy than previous state-of-the-art methods while using small extra cost
Summary
  • Introduction:

    Neural Architecture Search (NAS) has become the mainstream approach to discover efficient and powerful network structures (Zoph & Le (2017); Pham et al (2018); Tan et al (2019); Liu et al (2019a)).
  • Designing of search spaces is critical for NAS algorithms and different choices have been explored.
  • The proposed search spaces generally have only a small set of choices for each block.
  • The searched building block in a supernet should be as small as possible to generate the most diversified model structures
  • Methods:

    EXPERIMENTS ON IMAGENET

    The authors apply AtomNAS to search high performance light-weight model on ImageNet 2012 classification task (Deng et al, 2009).
  • The authors first pretrain AtomNAS models (without Swish activation function (Ramachandran et al, 2018) and Squeeze-and-Excitation (SE) module (Hu et al, 2018)) on ImageNet, use them as drop-in replacements for the backbone in the Mask-RCNN model (He et al, 2017a) by building the detection head on top of the last feature map, and finetune the model on COCO dataset.
  • Results:

    The authors' method achieves state-of-the-art performance under several FLOPs configurations on ImageNet with a small searching cost.
  • The authors' method achieves 75.9% top-1 accuracy on ImageNet dataset around 360M FLOPs, which is 0.9% higher than state-of-the-art model (Stamoulis et al, 2019a).
  • With the proposed search space and AtomNAS, the authors achieve state-of-the-art performance on ImageNet dataset under mobile setting.
  • With models directly produced by AtomNAS, the method achieves the new state-of-the-art under all FLOPs constraints.
  • Conclusion:

    The authors revisit the common structure, i.e., two convolutions joined by a channel-wise operation, and reformulate it as an ensemble of atomic blocks
  • This perspective enables a much larger and more fine-grained search space.
  • For efficiently exploring the huge fine-grained search space, the authors propose an end-to-end framework named AtomNAS, which conducts architecture search and network training jointly.
  • The searched networks achieve significantly better accuracy than previous state-of-the-art methods while using small extra cost
Tables
  • Table1: Comparision with state-of-the-arts on ImageNet under the mobile setting. † denotes methods using extra network modules such as Swish activation and Squeeze-and-Excitation module. ‡ denotes using extra data augmentation such as MixUp and AutoAugment. ∗ denotes models searched and trained simultaneously
  • Table2: Influence of awareness of resource metric. The upper block uses equal penalties for all atomic blocks. The lower part uses our resource-aware atomic block selection
  • Table3: Influence of BN recalibration
  • Table4: Comparision with baseline backbones on COCO object detection and instance segmentation. Cls denotes the ImageNet top-1 accuracy; detect-mAP and seg-mAP denotes mean average precision for detection and instance segmentation on COCO dataset. The results of baseline models are from Stamoulis et al (2019b). SinglePath+ (Stamoulis et al, 2019b) contains SE module
Download tables as Excel
Related work
  • 2.1 NEURAL ARCHITECTURE SEARCH

    Recently, there is a growing interest in automated neural architecture design. Reinforce learning based NAS methods (Zoph & Le, 2017; Tan et al, 2019; Tan & Le, 2019b;a) are usually computational intensive, thus hampering its usage with limited computational budget. To accelerate the search procedure, ENAS (Pham et al, 2018) represents the search space using a directed acyclic graph and aims to search the optimal subgraph within the large supergraph. A training strategy of parameter sharing among subgraphs is proposed to significantly increase the searching efficiency. The similar idea of optimizing optimal subgraphs within a supergraph is also adopted by Liu et al (2019a); Jin et al (2019); Xu et al (2020); Wu et al (2019); Guo et al (2019); Cai et al (2019). Stamoulis et al (2019a); Yu et al (2020) further share the parameters of different paths within a block using super-kernel representation. A prominent disadvantage of the above methods is that their coarse search spaces only support selecting one out of a set of choices (e.g., selecting one kernel size from {3, 5, 7}). MixNet tries to benefit from mixed operations by using a predefined set of mixed operations {{3}, {3, 5}, {3, 5, 7}, {3, 5, 7, 9}}, where the channels are equally distributed
Funding
  • Proposes a fine-grained search space comprised of atomic blocks, a minimal search unit that is much smaller than the ones used in recent NAS algorithms
  • Proposes a resource-aware architecture search framework which automatically assigns the computational resources for each operation by jointly considering the performance and the computational cost
  • Our method achieves state-of-the-art performance under several FLOPs configurations on ImageNet with a small searching cost
  • Our method achieves 75.9% top-1 accuracy on ImageNet dataset around 360M FLOPs, which is 0.9% higher than state-of-the-art model
  • By further incorporating additional modules, our method achieves 77.6% top-1 accuracy
  • Achieves state-of-the-art performance on ImageNet dataset under mobile setting
Reference
  • Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019a.
    Findings
  • Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. CoRR, abs/1904.12760, 2019b.
    Findings
  • Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, and Ruijun Xu. Scarletnas: Bridging the gap between scalability and fairness in neural architecture search. CoRR, abs/1908.06022, 2019a.
    Findings
  • Xiangxiang Chu, Bo Zhang, Ruijun Xu, and Jixiang Li. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. CoRR, abs/1907.01845, 2019b.
    Findings
  • Ekin Dogus Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. Autoaugment: Learning augmentation policies from data. CoRR, abs/1805.09501, 2018.
    Findings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255, 2009.
    Google ScholarLocate open access versionFindings
  • Jiemin Fang, Yuzhu Sun, Qian Zhang, Yuan Li, Wenyu Liu, and Xinggang Wang. Densely connected search space for more flexible neural architecture search. CoRR, abs/1906.09607, 2019.
    Findings
  • Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, and Edward Choi. Morphnet: Fast & simple resource-constrained structure learning of deep networks. In CVPR, pp. 1586–1595, 2018.
    Google ScholarLocate open access versionFindings
  • Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. Single path one-shot neural architecture search with uniform sampling. CoRR, abs/1904.00420, 2019.
    Findings
  • Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pp. 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross B. Girshick. Mask R-CNN. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2980–2988, 2017a.
    Google ScholarLocate open access versionFindings
  • Yihui He, Xiangyu Zhang, and Jian Sun. Channel pruning for accelerating very deep neural networks. In ICCV, pp. 1398–1406, 2017b.
    Google ScholarLocate open access versionFindings
  • Corr, abs/1905.02244, 2019.
    Findings
  • Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
    Findings
  • Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In CVPR, pp. 7132–7141, 2018.
    Google ScholarLocate open access versionFindings
  • Xiaojie Jin, Jiang Wang, Joshua Slocum, Ming-Hsuan Yang, Shengyang Dai, Shuicheng Yan, and Jiashi Feng. Rc-darts: Resource constrained differentiable architecture search. arXiv preprint arXiv:1912.12814, 2019.
    Findings
  • Hanwen Liang, Shifeng Zhang, Jiacheng Sun, Xingqiu He, Weiran Huang, Kechen Zhuang, and Zhenguo Li. Darts+: Improved differentiable architecture search with early stopping, 2019.
    Google ScholarFindings
  • Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. Microsoft COCO: common objects in context. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, pp. 740–755, 2014.
    Google ScholarLocate open access versionFindings
  • Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: differentiable architecture search. In ICLR, 2019a.
    Google ScholarLocate open access versionFindings
  • Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efficient convolutional networks through network slimming. In ICCV, pp. 2755–2763, 2017.
    Google ScholarLocate open access versionFindings
  • Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. Rethinking the value of network pruning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019b.
    Google ScholarLocate open access versionFindings
  • Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A filter level pruning method for deep neural network compression. In ICCV, pp. 5068–5076, 2017.
    Google ScholarLocate open access versionFindings
  • Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet V2: practical guidelines for efficient CNN architecture design. In ECCV, pp. 122–138, 2018.
    Google ScholarLocate open access versionFindings
  • Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. In ICML, pp. 4092–4101, 2018.
    Google ScholarLocate open access versionFindings
  • Prajit Ramachandran, Barret Zoph, and Quoc V. Le. Searching for activation functions. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings, 2018.
    Google ScholarLocate open access versionFindings
  • Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, pp. 4510–4520, 2018.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. Single-path NAS: designing hardware-efficient convnets in less than 4 hours. CoRR, abs/1904.02877, 2019a.
    Findings
  • Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, and Diana Marculescu. Single-path mobile automl: Efficient convnet design and NAS hyperparameter optimization. CoRR, abs/1907.00959, 2019b.
    Findings
  • Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, pp. 6105–6114, 2019a.
    Google ScholarLocate open access versionFindings
  • Mingxing Tan and Quoc V. Le. Mixconv: Mixed depthwise convolutional kernels. CoRR, abs/1907.09595, 2019b.
    Findings
  • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In CVPR, pp. 2820– 2828, 2019.
    Google ScholarLocate open access versionFindings
  • Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In CVPR, pp. 10734–10742, 2019.
    Google ScholarLocate open access versionFindings
  • Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. PCDARTS: Partial channel connections for memory-efficient architecture search. In ICLR, 2020.
    Google ScholarLocate open access versionFindings
  • Jianbo Ye, Xin Lu, Zhe Lin, and James Z. Wang. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, and Quoc Le. Scaling up neural architecture search with big single-stage models, 2020. URL https://openreview.net/forum?id=HJe7unNFDH.
    Findings
  • Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Barret Zoph and Quoc V. Le. Neural architecture search with reinforcement learning. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments