Designing Network Design Spaces

CVPR, pp. 10425-10433, 2020.

Cited by: 16|Bibtex|Views280|Links
EI
Keywords:
deep neural networkneural architecture searchempirical distribution functionwide rangeefficient convolutional neural networkMore(9+)
Weibo:
We present a new network design paradigm that combines the advantages of manual design and neural architecture search

Abstract:

In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall proce...More

Code:

Data:

0
Introduction
  • Deep convolutional neural networks are the engine of visual recognition. Over the past several years better architectures have resulted in considerable progress in a wide range of visual recognition tasks.
  • Examples include LeNet [15], AlexNet [13], VGG [26], and ResNet [8]
  • This body of work advanced both the effectiveness of neural networks as well as the understanding of network design.
  • The above sequence of works demonstrated the importance of convolution, network and data size, depth, and residuals, respectively.
  • The outcome of these works is not just particular network instantiations, and design principles that can be generalized and applied to numerous settings
Highlights
  • Deep convolutional neural networks are the engine of visual recognition
  • We present a new network design paradigm that combines the advantages of manual design and neural architecture search
  • We analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design
  • Our results suggest that designing network design spaces is a promising avenue for future research
Methods
  • Design Space Design

    The authors' goal is to design better networks for visual recognition. Rather than designing or searching for a single best model under specific settings, the authors study the behavior of populations of models.
  • The authors aim to discover general design principles that can apply to and improve an entire model population
  • Such design principles can provide insights into network design and are more likely to generalize to new settings.
  • The core insight from [21] is that the authors can sample models from a design space, giving rise to a model distribution, and turn to tools from classical statistics to analyze the design space
  • The authors note that this differs from architecture search, where the goal is to find the single best model from the space.
  • The 5-stage results show the regular structure of RegNet can generalize to more stages, where AnyNetXA has even more degrees of freedom
Results
  • Results are shown in Figure

    18 and Table 4.
  • Results are shown in Figure.
  • 18 and Table 4.
  • EFFICIENTNET outperforms the REGNETY.
  • REGNETY outperforms EFFICIENTNET, and at higher flops both REGNETX and REGNETY perform better.
  • The authors observe that for EFFICIENTNET, activations scale linearly with flops, compared to activations scaling with the square-root of flops for REGNETs. The authors observe that for EFFICIENTNET, activations scale linearly with flops, compared to activations scaling with the square-root of flops for REGNETs
  • This leads to slow GPU training and inference times for EFFICIENTNET.
  • E.g., REGNETX-8000 is 5× faster than EFFICIENTNET-B5, while having lower error
Conclusion
  • The authors present a new network design paradigm. The authors' results suggest that designing network design spaces is a promising avenue for future research. error (top-1) RESNET-50 35.0±0.20 RESNEXT-50 33.5±0.10 33.2±0.20 RESNET-101 33.2±0.24 RESNEXT-101 32.1±0.30.
Summary
  • Introduction:

    Deep convolutional neural networks are the engine of visual recognition. Over the past several years better architectures have resulted in considerable progress in a wide range of visual recognition tasks.
  • Examples include LeNet [15], AlexNet [13], VGG [26], and ResNet [8]
  • This body of work advanced both the effectiveness of neural networks as well as the understanding of network design.
  • The above sequence of works demonstrated the importance of convolution, network and data size, depth, and residuals, respectively.
  • The outcome of these works is not just particular network instantiations, and design principles that can be generalized and applied to numerous settings
  • Methods:

    Design Space Design

    The authors' goal is to design better networks for visual recognition. Rather than designing or searching for a single best model under specific settings, the authors study the behavior of populations of models.
  • The authors aim to discover general design principles that can apply to and improve an entire model population
  • Such design principles can provide insights into network design and are more likely to generalize to new settings.
  • The core insight from [21] is that the authors can sample models from a design space, giving rise to a model distribution, and turn to tools from classical statistics to analyze the design space
  • The authors note that this differs from architecture search, where the goal is to find the single best model from the space.
  • The 5-stage results show the regular structure of RegNet can generalize to more stages, where AnyNetXA has even more degrees of freedom
  • Results:

    Results are shown in Figure

    18 and Table 4.
  • Results are shown in Figure.
  • 18 and Table 4.
  • EFFICIENTNET outperforms the REGNETY.
  • REGNETY outperforms EFFICIENTNET, and at higher flops both REGNETX and REGNETY perform better.
  • The authors observe that for EFFICIENTNET, activations scale linearly with flops, compared to activations scaling with the square-root of flops for REGNETs. The authors observe that for EFFICIENTNET, activations scale linearly with flops, compared to activations scaling with the square-root of flops for REGNETs
  • This leads to slow GPU training and inference times for EFFICIENTNET.
  • E.g., REGNETX-8000 is 5× faster than EFFICIENTNET-B5, while having lower error
  • Conclusion:

    The authors present a new network design paradigm. The authors' results suggest that designing network design spaces is a promising avenue for future research. error (top-1) RESNET-50 35.0±0.20 RESNEXT-50 33.5±0.10 33.2±0.20 RESNET-101 33.2±0.24 RESNEXT-101 32.1±0.30.
Tables
  • Table1: Design space summary. See text for details
  • Table2: Mobile regime. We compare existing models using originally reported errors to RegNet models trained in a basic setup. Our simple RegNet models achieve surprisingly good results given the effort focused on this regime in the past few years
  • Table3: RESNE(X)T comparisons. (a) Grouped by activations, REGNETX show considerable gains (note that for each group GPU inference and training times are similar). (b) REGNETX models outperform RESNE(X)T models under fixed flops as well
  • Table4: EFFICIENTNET comparisons using our standard training schedule. Under comparable training settings, REGNETY outperforms EFFICIENTNET for most flop regimes. Moreover, REGNET models are considerably faster, e.g., REGNETX-F8000 is about 5× faster than EFFICIENTNET-B5. Note that originally reported errors for EFFICIENTNET (shown grayed out), are much lower but use longer and enhanced training schedules, see Table 7
  • Table5: RESNE(X)T comparisons on ImageNetV2
  • Table6: EFFICIENTNET comparisons on ImageNetV2
  • Table7: Training enhancements to EFFICIENTNET-B0. Our EFFICIENTNET-B0 reproduction with DropPath [<a class="ref-link" id="c14" href="#r14">14</a>] and a 250 epoch training schedule (third row), achieves results slightly inferior to original results (bottom row), which additionally used RMSProp [<a class="ref-link" id="c30" href="#r30">30</a>], AutoAugment [<a class="ref-link" id="c2" href="#r2">2</a>], etc. Without these enhancements to the training setup results are ∼2% lower (top row), highlighting the importance of carefully controlling the training setup
Download tables as Excel
Related work
  • Manual network design. The introduction of AlexNet [13] catapulted network design into a thriving research area. In the following years, improved network designs were proposed; examples include VGG [26], Inception [27, 28], ResNet [8], ResNeXt [31], DenseNet [11], and MobileNet [9, 25]. The design process behind these networks was largely manual and focussed on discovering new design choices that improve accuracy e.g., the use of deeper models or residuals. We likewise share the goal of discovering new design principles. In fact, our methodology is analogous to manual design but performed at the design space level.
Funding
  • Explores the structure aspect of network design and arrive at a low-dimensional design space consisting of simple, regular networks that calls RegNet
  • Proposes to design network design spaces, where a design space is a parametrized set of possible model architectures
  • Presents a new network design paradigm that combines the advantages of manual design and NAS
  • Shows that the RegNet design space generalizes to larger compute regimes, schedule lengths, and network block types
Reference
  • F. Chollet. Xception: Deep learning with depthwise separable convolutions. In CVPR, 2017. 7
    Google ScholarLocate open access versionFindings
  • E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. AutoAugment: Learning augmentation policies from data. arXiv:1805.09501, 2018. 8, 11
    Findings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 2, 3, 8, 10
    Google ScholarLocate open access versionFindings
  • T. DeVries and G. W. Taylor. Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552, 2017. 8
    Findings
  • B. Efron and R. J. Tibshirani. An introduction to the bootstrap. CRC press, 1994. 3
    Google ScholarFindings
  • P. Goyal, P. Dollar, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677, 2017. 11
    Findings
  • K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, 2015. 2
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 1, 2, 3, 9
    Google ScholarLocate open access versionFindings
  • A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017. 2, 8
    Findings
  • J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. In CVPR, 2018. 7
    Google ScholarLocate open access versionFindings
  • G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In CVPR, 2017. 2
    Google ScholarLocate open access versionFindings
  • S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015. 4
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 2
    Google ScholarLocate open access versionFindings
  • G. Larsson, M. Maire, and G. Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. In ICLR, 2017. 8, 11
    Google ScholarLocate open access versionFindings
  • Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989. 1
    Google ScholarFindings
  • C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. Deeplysupervised nets. In AISTATS, 2015. 8
    Google ScholarLocate open access versionFindings
  • C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. In ECCV, 2018. 2, 8
    Google ScholarLocate open access versionFindings
  • H. Liu, K. Simonyan, and Y. Yang. Darts: Differentiable architecture search. In ICLR, 2019. 1, 2, 8
    Google ScholarLocate open access versionFindings
  • N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV, 2018. 8
    Google ScholarLocate open access versionFindings
  • H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. Efficient neural architecture search via parameter sharing. In ICML, 2018. 2
    Google ScholarLocate open access versionFindings
  • I. Radosavovic, J. Johnson, S. Xie, W.-Y. Lo, and P. Dollar. On network design spaces for visual recognition. In ICCV, 2019. 1, 2, 3, 4, 11
    Google ScholarLocate open access versionFindings
  • P. Ramachandran, B. Zoph, and Q. V. Le. Searching for activation functions. arXiv:1710.05941, 2017. 10
    Findings
  • E. Real, A. Aggarwal, Y. Huang, and Q. V. Le. Regularized evolution for image classifier architecture search. In AAAI, 2019. 2, 8
    Google ScholarLocate open access versionFindings
  • B. Recht, R. Roelofs, L. Schmidt, and V. Shankar. Do imagenet classifiers generalize to imagenet? arXiv:1902.10811, 2019. 2, 10
    Findings
  • M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018. 2, 7, 8
    Google ScholarLocate open access versionFindings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 1, 2
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015. 2
    Google ScholarLocate open access versionFindings
  • C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016. 2
    Google ScholarFindings
  • M. Tan and Q. V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. ICML, 2019. 1, 2, 7, 9, 10, 11
    Google ScholarLocate open access versionFindings
  • T. Tieleman and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. Coursera: Neural networks for machine learning, 2012. 11
    Google ScholarLocate open access versionFindings
  • S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In CVPR, 2017. 2, 4, 9
    Google ScholarLocate open access versionFindings
  • S. Zagoruyko and N. Komodakis. Wide residual networks. In BMVC, 2016. 2
    Google ScholarLocate open access versionFindings
  • X. Zhang, X. Zhou, M. Lin, and J. Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR, 2018. 8
    Google ScholarLocate open access versionFindings
  • B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In ICLR, 2017. 1
    Google ScholarLocate open access versionFindings
  • B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. In CVPR, 2018. 2, 8
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments