Learning Transferable Architectures for Scalable Image Recognition

computer vision and pattern recognition, 2018.

Cited by: 1631|Bibtex|Views384|DOI:https://doi.org/10.1109/cvpr.2018.00907
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We demonstrate how to learn scalable, convolutional cells from data that transfer to multiple image classification tasks

Abstract:

Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dat...More

Code:

Data:

0
Introduction
  • Developing neural network image classification models often requires significant architecture engineering.
  • Searching for the best cell structure has two main benefits: it is much faster than searching for an entire network architecture and the cell itself is more likely to generalize to other problems.
  • In the experiments, this approach significantly accelerates the search for the best architectures using CIFAR-10 by a factor of 7× and learns architectures that successfully transfer to ImageNet
Highlights
  • Developing neural network image classification models often requires significant architecture engineering
  • The controller recurrent neural network was trained using Proximal Policy Optimization (PPO) [51] by employing a global workqueue system for generating a pool of child networks controlled by the recurrent neural network
  • In the first set of experiments, we train several image classification systems operating on 299x299 or 331x331 resolution images with different experiments scaled in computational demand to create models that are roughly on par in computational cost with Inception-v2 [29], Inception-v3 [60] and PolyNet [69]
  • We demonstrate how to learn scalable, convolutional cells from data that transfer to multiple image classification tasks
  • The resulting architectures approach or exceed stateof-the-art performance in both CIFAR-10 and ImageNet datasets with less computational demand than humandesigned architectures [60, 29, 69]
  • Our model is 1.2% better in top-1 accuracy than the best human-invented architectures while having 9 billion fewer FLOPS – a reduction of 28% in computational demand from the previous state-of-the-art model
  • We find that image features obtained from ImageNet used in combination with the FasterRCNN framework achieves state-of-the-art object detection results
Methods
  • The authors' work makes use of search methods to find good convolutional architectures on a dataset of interest.
  • The main search method the authors use in this work is the Neural Architecture Search (NAS) framework proposed by [71].
  • In NAS, a controller recurrent neural network (RNN) samples child networks with different architectures.
  • The resulting accuracies are used to update the controller so that the controller will generate better architectures over time.
  • The controller weights are updated with policy gradient
Results
  • The authors describe the experiments with the method described above to learn convolutional cells.
  • In the first set of experiments, the authors train several image classification systems operating on 299x299 or 331x331 resolution images with different experiments scaled in computational demand to create models that are roughly on par in computational cost with Inception-v2 [29], Inception-v3 [60] and PolyNet [69]
  • The authors show that this family of models achieve state-of-the-art performance with fewer floating point operations and parameters than comparable architectures.
  • The authors' training setup on ImageNet is similar to [60], but please see Appendix A for details
Conclusion
  • The authors demonstrate how to learn scalable, convolutional cells from data that transfer to multiple image classification tasks.
  • The key insight in the approach is to design a search space that decouples the complexity of an architecture from the depth of a network
  • This resulting search space permits identifying good architectures on a small dataset (i.e., CIFAR-10) and transferring the learned architecture to image classifications across a range of data and computational scales.
  • The resulting architectures approach or exceed stateof-the-art performance in both CIFAR-10 and ImageNet datasets with less computational demand than humandesigned architectures [60, 29, 69].
  • The authors demonstrate that the authors can use the resulting learned architecture to perform ImageNet classification with reduced computational budgets that outperform streamlined architectures targeted to mobile and embedded platforms [24, 70]
Summary
  • Introduction:

    Developing neural network image classification models often requires significant architecture engineering.
  • Searching for the best cell structure has two main benefits: it is much faster than searching for an entire network architecture and the cell itself is more likely to generalize to other problems.
  • In the experiments, this approach significantly accelerates the search for the best architectures using CIFAR-10 by a factor of 7× and learns architectures that successfully transfer to ImageNet
  • Methods:

    The authors' work makes use of search methods to find good convolutional architectures on a dataset of interest.
  • The main search method the authors use in this work is the Neural Architecture Search (NAS) framework proposed by [71].
  • In NAS, a controller recurrent neural network (RNN) samples child networks with different architectures.
  • The resulting accuracies are used to update the controller so that the controller will generate better architectures over time.
  • The controller weights are updated with policy gradient
  • Results:

    The authors describe the experiments with the method described above to learn convolutional cells.
  • In the first set of experiments, the authors train several image classification systems operating on 299x299 or 331x331 resolution images with different experiments scaled in computational demand to create models that are roughly on par in computational cost with Inception-v2 [29], Inception-v3 [60] and PolyNet [69]
  • The authors show that this family of models achieve state-of-the-art performance with fewer floating point operations and parameters than comparable architectures.
  • The authors' training setup on ImageNet is similar to [60], but please see Appendix A for details
  • Conclusion:

    The authors demonstrate how to learn scalable, convolutional cells from data that transfer to multiple image classification tasks.
  • The key insight in the approach is to design a search space that decouples the complexity of an architecture from the depth of a network
  • This resulting search space permits identifying good architectures on a small dataset (i.e., CIFAR-10) and transferring the learned architecture to image classifications across a range of data and computational scales.
  • The resulting architectures approach or exceed stateof-the-art performance in both CIFAR-10 and ImageNet datasets with less computational demand than humandesigned architectures [60, 29, 69].
  • The authors demonstrate that the authors can use the resulting learned architecture to perform ImageNet classification with reduced computational budgets that outperform streamlined architectures targeted to mobile and embedded platforms [24, 70]
Tables
  • Table1: Performance of Neural Architecture Search and other state-of-the-art models on CIFAR-10. All results for NASNet are the mean accuracy across 5 runs
  • Table2: Performance of architecture search and other published state-of-the-art models on ImageNet classification. Mult-Adds indicate the number of composite multiply-accumulate operations for a single image. Note that the composite multiple-accumulate operations are calculated for the image size reported in the table. Model size for [<a class="ref-link" id="c25" href="#r25">25</a>] calculated from open-source implementation
  • Table3: Performance on ImageNet classification on a subset of models operating in a constrained computational setting, i.e., < 1.5 B multiply-accumulate operations per image. All models use 224x224 images. † indicates top-1 accuracy not reported in [<a class="ref-link" id="c59" href="#r59">59</a>] but from open-source implementation
  • Table4: Object detection performance on COCO on mini-val and test-dev datasets across a variety of image featurizations. All results are with the Faster-RCNN object detection framework [<a class="ref-link" id="c47" href="#r47">47</a>] from a single crop of an image. Top rows highlight mobile-optimized image featurizations, while bottom rows indicate computationally heavy image featurizations geared towards achieving best results. All mini-val results employ the same 8K subset of validation images in [<a class="ref-link" id="c28" href="#r28">28</a>]
Download tables as Excel
Related work
  • The proposed method is related to previous work in hyperparameter optimization [44, 4, 5, 54, 55, 6, 40] – especially recent approaches in designing architectures such as Neural Fabrics [48], DiffRNN [41], MetaQNN [3] and DeepArchitect [43]. A more flexible class of methods for designing architecture is evolutionary algorithms [65, 16, 57, 30, 46, 42, 67], yet they have not had as much success at large scale. Xie and Yuille [67] also transferred learned architectures from CIFAR-10 to ImageNet but performance of these models (top-1 accuracy 72.1%) are notably below previous state-of-the-art (Table 2).

    The concept of having one neural network interact with a second neural network to aid the learning process, or learning to learn or meta-learning [23, 49] has attracted much attention in recent years [1, 62, 14, 19, 35, 45, 15]. Most of these approaches have not been scaled to large problems like ImageNet. An exception is the recent work focused on learning an optimizer for ImageNet classification that achieved notable improvements [64].
Reference
  • M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, and N. de Freitas. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems, pages 3981–3989, 2016. 2
    Google ScholarLocate open access versionFindings
  • J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016. 12
    Findings
  • B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. In International Conference on Learning Representations, 2016. 2
    Google ScholarLocate open access versionFindings
  • J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for hyper-parameter optimization. In Neural Information Processing Systems, 2011. 2
    Google ScholarLocate open access versionFindings
  • J. Bergstra and Y. Bengio. Random search for hyperparameter optimization. Journal of Machine Learning Research, 2012. 2, 8
    Google ScholarLocate open access versionFindings
  • J. Bergstra, D. Yamins, and D. D. Cox. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. International Conference on Machine Learning, 2013. 2
    Google ScholarLocate open access versionFindings
  • J. Chen, R. Monga, S. Bengio, and R. Jozefowicz. Revisiting distributed synchronous sgd. In International Conference on Learning Representations Workshop Track, 2016. 12
    Google ScholarLocate open access versionFindings
  • Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng. Dual path networks. arXiv preprint arXiv:1707.01083, 2017. 5, 7
    Findings
  • F. Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2, 7
    Google ScholarLocate open access versionFindings
  • D.-A. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). In International Conference on Learning Representations, 2016. 11
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009. 1, 12
    Google ScholarLocate open access versionFindings
  • T. DeVries and G. W. Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017. 5, 6
    Findings
  • J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In International Conference on Machine Learning, volume 32, pages 647–655, 2014. 6
    Google ScholarLocate open access versionFindings
  • Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. RL2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016. 2
    Findings
  • C. Finn, P. Abbeel, and S. Levine. Model-agnostic metalearning for fast adaptation of deep networks. In International Conference on Machine Learning, 2017. 2
    Google ScholarLocate open access versionFindings
  • D. Floreano, P. Durr, and C. Mattiussi. Neuroevolution: from architectures to learning. Evolutionary Intelligence, 2008. 2
    Google ScholarFindings
  • K. Fukushima. A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, page 93202, 1980. 1
    Google ScholarLocate open access versionFindings
  • X. Gastaldi. Shake-shake regularization of 3-branch residual networks. In International Conference on Learning Representations Workshop Track, 2017. 6, 12
    Google ScholarLocate open access versionFindings
  • D. Ha, A. Dai, and Q. V. Le. Hypernetworks. In International Conference on Learning Representations, 2017. 2
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1, 2, 3, 4
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, 2016. 11
    Google ScholarLocate open access versionFindings
  • S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 1997. 2, 11
    Google ScholarLocate open access versionFindings
  • S. Hochreiter, A. Younger, and P. Conwell. Learning to learn using gradient descent. Artificial Neural Networks, pages 87–94, 2001. 2
    Google ScholarLocate open access versionFindings
  • A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017. 2, 5, 7, 8, 11
    Findings
  • J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507, 2017. 5, 7
    Findings
  • G. Huang, Z. Liu, and K. Q. Weinberger. Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2017. 6
    Google ScholarLocate open access versionFindings
  • G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Weinberger. Deep networks with stochastic depth. In European Conference on Computer Vision, 2016. 11
    Google ScholarLocate open access versionFindings
  • J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, et al. Speed/accuracy trade-offs for modern convolutional object detectors. In IEEE Conference on Computer Vision and Pattern Recognition, 2017. 6, 7, 8, 14
    Google ScholarLocate open access versionFindings
  • S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Learning Representations, 2015. 2, 5, 7, 8
    Google ScholarLocate open access versionFindings
  • R. Jozefowicz, W. Zaremba, and I. Sutskever. An empirical exploration of recurrent network architectures. In International Conference on Learning Representations, 2015. 2
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 4, 11
    Google ScholarFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing System, 2012. 1, 3
    Google ScholarLocate open access versionFindings
  • G. Larsson, M. Maire, and G. Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648, 2016. 4, 11
    Findings
  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 1998. 1
    Google ScholarLocate open access versionFindings
  • K. Li and J. Malik. Learning to optimize neural nets. arXiv preprint arXiv:1703.00441, 2017. 2
    Findings
  • T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 7
    Google ScholarLocate open access versionFindings
  • T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002, 2017. 7, 8
    Findings
  • T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision, pages 740–755. Springer, 2014. 7
    Google ScholarLocate open access versionFindings
  • I. Loshchilov and F. Hutter. SGDR: Stochastic gradient descent with warm restarts. In International Conference on Learning Representations, 2017. 12
    Google ScholarLocate open access versionFindings
  • H. Mendoza, A. Klein, M. Feurer, J. T. Springenberg, and F. Hutter. Towards automatically-tuned neural networks. In Proceedings of the 2016 Workshop on Automatic Machine Learning, pages 58–65, 2016. 2
    Google ScholarLocate open access versionFindings
  • T. Miconi. Neural networks with differentiable structure. arXiv preprint arXiv:1606.06216, 2016. 2
    Findings
  • R. Miikkulainen, J. Liang, E. Meyerson, A. Rawal, D. Fink, O. Francon, B. Raju, A. Navruzyan, N. Duffy, and B. Hodjat. Evolving deep neural networks. arXiv preprint arXiv:1703.00548, 2017. 2
    Findings
  • R. Negrinho and G. Gordon. DeepArchitect: Automatically designing and training deep architectures. arXiv preprint arXiv:1704.08792, 2017. 2
    Findings
  • N. Pinto, D. Doukhan, J. J. DiCarlo, and D. D. Cox. A highthroughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Computational Biology, 5(11):e1000579, 2009. 2
    Google ScholarLocate open access versionFindings
  • S. Ravi and H. Larochelle. Optimization as a model for fewshot learning. In International Conference on Learning Representations, 2017. 2
    Google ScholarLocate open access versionFindings
  • E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, Q. Le, and A. Kurakin. Large-scale evolution of image classifiers. In International Conference on Machine Learning, 2017. 2
    Google ScholarLocate open access versionFindings
  • S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91–99, 2015. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • S. Saxena and J. Verbeek. Convolutional neural fabrics. In Advances in Neural Information Processing Systems, 2016. 2
    Google ScholarLocate open access versionFindings
  • T. Schaul and J. Schmidhuber. Metalearning. Scholarpedia, 2010. 2
    Google ScholarLocate open access versionFindings
  • F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 815–823, 2015. 8
    Google ScholarLocate open access versionFindings
  • J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. 4, 11
    Findings
  • A. Shrivastava, R. Sukthankar, J. Malik, and A. Gupta. Beyond skip connections: Top-down modulation for object detection. arXiv preprint arXiv:1612.06851, 2016. 7, 8
    Findings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015. 1, 2, 3, 4
    Google ScholarLocate open access versionFindings
  • J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimization of machine learning algorithms. In Neural Information Processing Systems, 2012. 2
    Google ScholarLocate open access versionFindings
  • J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Ali, R. P. Adams, et al. Scalable Bayesian optimization using deep neural networks. In International Conference on Machine Learning, 2015. 2
    Google ScholarLocate open access versionFindings
  • N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014. 11
    Google ScholarLocate open access versionFindings
  • K. O. Stanley, D. B. D’Ambrosio, and J. Gauci. A hypercube-based encoding for evolving large-scale neural networks. Artificial Life, 2009. 2
    Google ScholarLocate open access versionFindings
  • C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. Inceptionv4, Inception-Resnet and the impact of residual connections on learning. In International Conference on Learning Representations Workshop Track, 2016. 1, 2, 3, 4, 7
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1, 2, 3, 4, 7
    Google ScholarLocate open access versionFindings
  • C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1, 2, 3, 4, 5, 7, 8, 12
    Google ScholarLocate open access versionFindings
  • D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016. 12
    Findings
  • J. X. Wang, Z. Kurth-Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R. Munos, C. Blundell, D. Kumaran, and M. Botvinick. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016. 2
    Findings
  • T. Weyand, I. Kostrikov, and J. Philbin. Planet-photo geolocation with convolutional neural networks. In European Conference on Computer Vision, 2016. 8
    Google ScholarLocate open access versionFindings
  • O. Wichrowska, N. Maheswaranathan, M. W. Hoffman, S. G. Colmenarejo, M. Denil, N. de Freitas, and J. Sohl-Dickstein. Learned optimizers that scale and generalize. arXiv preprint arXiv:1703.04813, 2017. 2
    Findings
  • D. Wierstra, F. J. Gomez, and J. Schmidhuber. Modeling systems with internal state using evolino. In The Genetic and Evolutionary Computation Conference, 2005. 2
    Google ScholarLocate open access versionFindings
  • R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Machine Learning, 1992. 11
    Google ScholarLocate open access versionFindings
  • L. Xie and A. Yuille. Genetic CNN. arXiv preprint arXiv:1703.01513, 2017. 2
    Findings
  • S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1, 2, 7
    Google ScholarLocate open access versionFindings
  • X. Zhang, Z. Li, C. C. Loy, and D. Lin. Polynet: A pursuit of structural diversity in very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 5, 7, 8, 11
    Google ScholarLocate open access versionFindings
  • X. Zhang, X. Zhou, L. Mengxiao, and J. Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083, 2017. 2, 5, 7, 8
    Findings
  • B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In International Conference on Learning Representations, 2017. 1, 2, 4, 6, 11
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments