Are Labels Necessary for Neural Architecture Search?

Cited by: 3|Bibtex|Views230|Links
Keywords:
nas algorithmdifferentiable architecture searchRotation prediction [12]neural network architecturearchitecture searchMore(11+)
Weibo:
The Unsupervised Neural Architecture Search algorithm variants with Rotation prediction [12], Colorization [34], jigsaw puzzles [21] objectives all perform very well, closely approaching the results obtained by the supervised counterpart

Abstract:

Existing neural network architectures in computer vision --- whether designed by humans or by machines --- were typically found using both images and their associated labels. In this paper, we ask the question: can we find high-quality neural architectures using only images, but no human-annotated labels? To answer this question, we fir...More

Code:

Data:

0
Introduction
  • Neural architecture search (NAS) has emerged as a research problem of searching for architectures that perform well on target data and tasks.
  • A key mystery surrounding NAS is what factors contribute to the success of the search.
  • Using the target data and tasks during the search will result in the least domain gap, and this is the strategy adopted in early NAS attempts [35,26].
  • Researchers [36] started to utilize the transferability of architectures, which enabled the search to be performed on different data and labels (e.g., CIFAR-10) than the target (e.g., ImageNet).
  • In other words, existing NAS approaches perform search in the supervised learning regime
Highlights
  • Neural architecture search (NAS) has emerged as a research problem of searching for architectures that perform well on target data and tasks
  • The goal of this paper is to provide an answer to the question asked in the title: are labels necessary for neural architecture search? To formalize this question, we define a new setup called Unsupervised Neural Architecture Search (UnNAS)
  • The UnNAS algorithm variants with Rotation prediction [12] (Rot), Colorization [34] (Color), jigsaw puzzles [21] (Jigsaw) objectives all perform very well, closely approaching the results obtained by the supervised counterpart
  • In sample-based experiments, by randomly sampling a large number of architectures, we discover the phenomenon that the architecture rankings produced with and without labels are highly correlated
  • In search-based experiments, by making minimal modifications to a well-established NAS algorithm, DARTS,7 we show that the architectures learned without accessing labels perform competitively, relative to their supervised counterpart, and in terms of absolute performance
  • The findings in this paper indicate that labels are not necessary for neural architecture search
Methods
  • As Section 3 describes, an architecture discovered in an unsupervised fashion will be evaluated by its performance in a supervised setting.
  • In sample-based, each network is trained and evaluated individually, but the downside is that the authors can only consider a small, random subset of the search space.
  • In search-based, the focus is to find a top architecture from the entire search space, but the downside is that the training dynamics during the search phase does not exactly match that of the evaluation phase.
Results
  • High rank correlation between supervised accuracy and pretext accuracy on the same dataset.
  • The UnNAS algorithm variants with Rot, Color, Jigsaw objectives all perform very well, closely approaching the results obtained by the supervised counterpart.
  • This suggests it might be desirable to perform architecture search on the target dataset directly, as observed in other work [3]
Conclusion
  • The authors challenge the common practice in neural architecture search and ask the question: do the authors really need labels to successfully perform NAS? The authors approach this question with two sets of experiments: sample-based and searchbased.
  • In search-based experiments, by making minimal modifications to a well-established NAS algorithm, DARTS,7 the authors show that the architectures learned without accessing labels perform competitively, relative to their supervised counterpart, and in terms of absolute performance.
  • In both experiments, the observations are consistent and robust across various datasets, tasks, and/or search spaces.
  • UnNAS could be especially beneficial to the many applications where data constantly comes in at large volume but labeling is costly
Summary
  • Introduction:

    Neural architecture search (NAS) has emerged as a research problem of searching for architectures that perform well on target data and tasks.
  • A key mystery surrounding NAS is what factors contribute to the success of the search.
  • Using the target data and tasks during the search will result in the least domain gap, and this is the strategy adopted in early NAS attempts [35,26].
  • Researchers [36] started to utilize the transferability of architectures, which enabled the search to be performed on different data and labels (e.g., CIFAR-10) than the target (e.g., ImageNet).
  • In other words, existing NAS approaches perform search in the supervised learning regime
  • Objectives:

    The goal of this paper is to provide an answer to the question asked in the title: are labels necessary for neural architecture search? To formalize this question, the authors define a new setup called Unsupervised Neural Architecture Search (UnNAS).
  • Methods:

    As Section 3 describes, an architecture discovered in an unsupervised fashion will be evaluated by its performance in a supervised setting.
  • In sample-based, each network is trained and evaluated individually, but the downside is that the authors can only consider a small, random subset of the search space.
  • In search-based, the focus is to find a top architecture from the entire search space, but the downside is that the training dynamics during the search phase does not exactly match that of the evaluation phase.
  • Results:

    High rank correlation between supervised accuracy and pretext accuracy on the same dataset.
  • The UnNAS algorithm variants with Rot, Color, Jigsaw objectives all perform very well, closely approaching the results obtained by the supervised counterpart.
  • This suggests it might be desirable to perform architecture search on the target dataset directly, as observed in other work [3]
  • Conclusion:

    The authors challenge the common practice in neural architecture search and ask the question: do the authors really need labels to successfully perform NAS? The authors approach this question with two sets of experiments: sample-based and searchbased.
  • In search-based experiments, by making minimal modifications to a well-established NAS algorithm, DARTS,7 the authors show that the architectures learned without accessing labels perform competitively, relative to their supervised counterpart, and in terms of absolute performance.
  • In both experiments, the observations are consistent and robust across various datasets, tasks, and/or search spaces.
  • UnNAS could be especially beneficial to the many applications where data constantly comes in at large volume but labeling is costly
Tables
  • Table1: ImageNet-1K classification results of the architectures searched by NAS and UnNAS algorithms. Rows in gray correspond to invalid UnNAS configurations where the search and evaluation datasets are the same. † is our training result of the DARTS architecture released in [<a class="ref-link" id="c19" href="#r19">19</a>]
  • Table2: Cityscapes semantic segmentation results of the architectures searched by NAS and UnNAS algorithms. These are trained from scratch: there is no fine-tuning from ImageNet checkpoint. Rows in gray correspond to an illegitimate setup where the search dataset is the same as the evaluation dataset. † is our training result of the DARTS architecture released in [<a class="ref-link" id="c19" href="#r19">19</a>]
Download tables as Excel
Related work
  • Neural Architecture Search. Research on the NAS problem involves designing the search space [36,33] and the search algorithm [35,25]. There are special focuses on reducing the overall time cost of the search process [18,23,19], or on extending to a larger variety of tasks [4,17,11,27]. Existing works on NAS all use human-annotated labels during the search phase. Our work is orthogonal to existing NAS research, and advances in the existing NAS literature may also be applicable in our unsupervised setup.

    Architecture Transferability. In early NAS attempts [35,26], the search phase and the evaluation phase typically operate on the same dataset and task. Later, researchers realized that it is possible to relax this constraint. In these situations, the dataset and task used in the search phase are typically referred as the proxy to the target dataset and task, reflecting a notion of architecture transferability. [36] demonstrated that CIFAR-10 classification is a good proxy for ImageNet classification. [18] measured the rank correlation between these two tasks using a small number of architectures. [15] studied the transferability of 16 architectures (together with trained weights) between more supervised tasks. Part of our work studies architecture transferability at a larger scale, across supervised and unsupervised tasks.
Funding
  • On ImageNet, our UnNAS-DARTS architectures can comfortably outperform this baseline by up to 1% classification accuracy
Study subjects and analysis
search datasets: 3
NAS and UnNAS results are robust across a large variety of datasets and tasks. The three search datasets that we consider are of different nature. For example, IN22K is 10 times larger than IN1K, and Cityscapes images have a markedly different distribution than those in ImageNet

Reference
  • Google AutoML Vision API Tutorial (2019 (accessed Nov 14, 2019)), https://cloud.google.com/vision/automl/docs/tutorial
    Findings
  • Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(Feb), 281–305 (2012)
    Google ScholarLocate open access versionFindings
  • Cai, H., Zhu, L., Han, S.: Proxylessnas: Direct neural architecture search on target task and hardware. In: ICLR (2019)
    Google ScholarFindings
  • Chen, L.C., Collins, M., Zhu, Y., Papandreou, G., Zoph, B., Schroff, F., Adam, H., Shlens, J.: Searching for efficient multi-scale architectures for dense image prediction. In: NeurIPS (2018)
    Google ScholarFindings
  • Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)
    Findings
  • Chen, X., Xie, L., Wu, J., Tian, Q.: Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In: ICCV (2019)
    Google ScholarFindings
  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The Cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
    Google ScholarFindings
  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR (2009)
    Google ScholarFindings
  • Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: ICCV (2015)
    Google ScholarFindings
  • Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: NeurIPS (2014)
    Google ScholarFindings
  • Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: CVPR (2019)
    Google ScholarFindings
  • Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: ICLR (2018)
    Google ScholarFindings
  • He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
    Google ScholarFindings
  • Huber, P.J.: Robust statistics. Springer (2011)
    Google ScholarFindings
  • Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? In: CVPR (2019)
    Google ScholarFindings
  • Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep., Citeseer (2009)
    Google ScholarFindings
  • Liu, C., Chen, L.C., Schroff, F., Adam, H., Hua, W., Yuille, A.L., Fei-Fei, L.: Autodeeplab: Hierarchical neural architecture search for semantic image segmentation. In: CVPR (2019)
    Google ScholarFindings
  • Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search. In: ECCV (2018)
    Google ScholarFindings
  • Liu, H., Simonyan, K., Yang, Y.: Darts: Differentiable architecture search. In: ICLR (2019)
    Google ScholarFindings
  • Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. In: ICLR (2017)
    Google ScholarFindings
  • Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: ECCV (2016)
    Google ScholarFindings
  • Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv:1807.03748 (2018)
    Findings
  • Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: ICML (2018)
    Google ScholarFindings
  • Radosavovic, I., Johnson, J., Xie, S., Lo, W.Y., Dollar, P.: On network design spaces for visual recognition. In: ICCV (2019)
    Google ScholarFindings
  • Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: AAAI (2019)
    Google ScholarFindings
  • Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y.L., Tan, J., Le, Q.V., Kurakin, A.: Large-scale evolution of image classifiers. In: ICML (2017)
    Google ScholarFindings
  • So, D.R., Liang, C., Le, Q.V.: The evolved transformer. In: ICML (2019)
    Google ScholarFindings
  • Spearman, C.: The proof and measurement of association between two things. The American Journal of Psychology (1904)
    Google ScholarLocate open access versionFindings
  • Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. arXiv:1906.05849 (2019) 30. Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV (2015)
    Findings
  • 31. Wu, Z., Xiong, Y., Yu, S., Lin, D.: Unsupervised feature learning via nonparametric instance discrimination. In: CVPR (2018)
    Google ScholarFindings
  • 32. Xu, Y., Xie, L., Zhang, X., Chen, X., Qi, G.J., Tian, Q., Xiong, H.: Pc-darts: Partial channel connections for memory-efficient differentiable architecture search. In: ICLR (2020)
    Google ScholarFindings
  • 33. Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K., Hutter, F.: Nas-bench101: Towards reproducible neural architecture search. In: ICML (2019)
    Google ScholarFindings
  • 34. Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: ECCV (2016)
    Google ScholarFindings
  • 35. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: ICLR (2017)
    Google ScholarFindings
  • 36. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: CVPR (2018)
    Google ScholarFindings
Your rating :
0

 

Tags
Comments