NADS: Neural Architecture Distribution Search for Uncertainty Awareness

Randy Ardywibowo
Randy Ardywibowo
Shahin Boluki
Shahin Boluki

ICML, pp. 356-366, 2020.

Cited by: 0|Views66
EI
Weibo:
We developed a novel neural architecture distribution search formulation to identify a random ensemble of architectures that perform well on a given task

Abstract:

Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data. It becomes important for ML systems in critical applications to accurately quantify its predictive uncertainty and screen out these anomalous inputs. However, existing OoD...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
Highlights
  • Detecting anomalous data is crucial for safely applying machine learning in autonomous systems for critical applications and for AI safety (Amodei et al, 2016)
  • Unlike Neural Architecture Search (NAS) for common learning tasks, specifying a model and an objective to optimize for uncertainty estimation and outlier detection is not straightforward
  • We developed a novel neural architecture distribution search (NADS) formulation to identify a random ensemble of architectures that perform well on a given task
  • Instead of seeking to maximize the likelihood of indistribution data which may cause OoD samples to be mistakenly given a higher likelihood, we developed a search algorithm to optimize the Widely Applicable Information Criterion (WAIC) score, a Bayesian adjusted estimation of the data entropy
  • Using this formulation, we have identified several key features that make up good uncertainty quantification architectures, namely a simple structure in the shallower layers, use of information preserving operations, and a larger, more expressive structure with skip connections for deeper layers to ensure optimization stability
  • We perform multiple OoD detection experiments and observe that our Neural Architecture Distribution Search (NADS) performs favorably, with up to 57% improvement in accuracy compared to state-of-the-art methods among 15 different testing configurations
  • Using the architecture distribution learned by NADS, we constructed an ensemble of models to estimate the data entropy using the WAIC score
Results
  • The authors applied the architecture search on five datasets: CelebA (Liu et al.), CIFAR-10, CIFAR-100, (Krizhevsky et al, 2009), SVHN (Netzer et al, 2011), and MNIST (LeCun).
  • The authors used the Adam optimizer with a fixed learning rate of 1 × 10−5 with a batch size of 4 for 10000 iterations.
  • The authors trained the proposed method on 4 datasets Din: CIFAR-10, CIFAR-100 (Krizhevsky et al, 2009), SVHN (Netzer et al, 2011), and MNIST (LeCun).
  • The authors compare the ensemble search method against a traditional ensemble method that uses a single Glow (Kingma &
Conclusion
  • Unlike NAS for common learning tasks, specifying a model and an objective to optimize for uncertainty estimation and outlier detection is not straightforward.
  • Instead of seeking to maximize the likelihood of indistribution data which may cause OoD samples to be mistakenly given a higher likelihood, the authors developed a search algorithm to optimize the WAIC score, a Bayesian adjusted estimation of the data entropy
  • Using this formulation, the authors have identified several key features that make up good uncertainty quantification architectures, namely a simple structure in the shallower layers, use of information preserving operations, and a larger, more expressive structure with skip connections for deeper layers to ensure optimization stability.
  • NADS as a new uncertainty-aware architecture search strategy enables model uncertainty quantification that is critical for more robust and generalizable deep learning, a crucial step in safely applying deep learning to healthcare, autonomous driving, and disaster response
Summary
  • Introduction:

    Detecting anomalous data is crucial for safely applying machine learning in autonomous systems for critical applications and for AI safety (Amodei et al, 2016).
  • Such anoma-.
  • Existing proxy tasks include leveraging shared parameters (Pham et al, 2018), predicting performance using a surrogate model (Liu et al, 2018a), and early stopping (Zoph et al, 2018; Chen et al, 2018)
  • Objectives:

    Different from existing NAS methods, the aim is to derive an ensemble of deep models to improve model uncertainty quantification and OoD detection.
  • Results:

    The authors applied the architecture search on five datasets: CelebA (Liu et al.), CIFAR-10, CIFAR-100, (Krizhevsky et al, 2009), SVHN (Netzer et al, 2011), and MNIST (LeCun).
  • The authors used the Adam optimizer with a fixed learning rate of 1 × 10−5 with a batch size of 4 for 10000 iterations.
  • The authors trained the proposed method on 4 datasets Din: CIFAR-10, CIFAR-100 (Krizhevsky et al, 2009), SVHN (Netzer et al, 2011), and MNIST (LeCun).
  • The authors compare the ensemble search method against a traditional ensemble method that uses a single Glow (Kingma &
  • Conclusion:

    Unlike NAS for common learning tasks, specifying a model and an objective to optimize for uncertainty estimation and outlier detection is not straightforward.
  • Instead of seeking to maximize the likelihood of indistribution data which may cause OoD samples to be mistakenly given a higher likelihood, the authors developed a search algorithm to optimize the WAIC score, a Bayesian adjusted estimation of the data entropy
  • Using this formulation, the authors have identified several key features that make up good uncertainty quantification architectures, namely a simple structure in the shallower layers, use of information preserving operations, and a larger, more expressive structure with skip connections for deeper layers to ensure optimization stability.
  • NADS as a new uncertainty-aware architecture search strategy enables model uncertainty quantification that is critical for more robust and generalizable deep learning, a crucial step in safely applying deep learning to healthcare, autonomous driving, and disaster response
Tables
  • Table1: OoD detection results on various evaluation setups. We compared our method with MSP (Baseline) (Hendrycks & Gimpel,
  • Table2: OoD detection results on various training and testing experiments comparing our method with a baseline ensembling method that uses a fixed architecture trained multiple times with different random initializations
  • Table3: OoD detection results on various training and testing experiments comparing our method with ODIN (<a class="ref-link" id="cLiang_et+al_2017_a" href="#rLiang_et+al_2017_a">Liang et al, 2017</a>)
Download tables as Excel
Funding
  • The presented materials are based upon the work supported by the National Science Foundation under Grants CCF1553281, IIS-1812641, and CCF-1934904; and the Defense Advanced Research Projects Agency under grand FA8750-18-2-0027
Reference
  • Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., and Mane, D. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
    Findings
  • Ardywibowo, R., Huang, S., Gui, S., Xiao, C., Cheng, Y., Liu, J., and Qian, X. Switching-state dynamical modeling of daily behavioral data. Journal of Healthcare Informatics Research, 2(3):228–247, 2018.
    Google ScholarLocate open access versionFindings
  • Ardywibowo, R., Zhao, G., Wang, Z., Mortazavi, B., Huang, S., and Qian, X. Adaptive activity monitoring with uncertainty quantification in switching Gaussian process models. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 266– 275, 2019.
    Google ScholarLocate open access versionFindings
  • Baker, B., Gupta, O., Naik, N., and Raskar, R. Designing neural network architectures using reinforcement learning. In International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Boluki, S., Ardywibowo, R., Dadaneh, S. Z., Zhou, M., and Qian, X. Learnable Bernoulli dropout for Bayesian deep learning. arXiv preprint arXiv:2002.05155, 2020.
    Findings
  • Chang, J., Zhang, X., Guo, Y., Meng, G., Xiang, S., and Pan, C. Differentiable architecture search with ensemble Gumbel-Softmax. arXiv preprint arXiv:1905.01786, 2019.
    Findings
  • Chen, L.-C., Collins, M., Zhu, Y., Papandreou, G., Zoph, B., Schroff, F., Adam, H., and Shlens, J. Searching for efficient multi-scale architectures for dense image prediction. In Advances in Neural Information Processing Systems, pp. 8699–8710, 2018.
    Google ScholarLocate open access versionFindings
  • Choi, H. and Jang, E. Generative ensembles for robust anomaly detection. arXiv preprint arXiv:1810.01392, 2018.
    Findings
  • Dadaneh, S. Z., Boluki, S., Yin, M., Zhou, M., and Qian, X. Pairwise supervised hashing with Bernoulli variational auto-encoder and self-control gradient estimator. arXiv preprint arXiv:2005.10477, 2020a.
    Findings
  • Dadaneh, S. Z., Boluki, S., Zhou, M., and Qian, X. Arsm gradient estimator for supervised learning to rank. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3157–3161, 2020b.
    Google ScholarLocate open access versionFindings
  • Dinh, L., Sohl-Dickstein, J., and Bengio, S. Density estimation using Real NVP. arXiv preprint arXiv:1605.08803, 2016.
    Findings
  • Elsken, T., Metzen, J. H., and Hutter, F. Neural architecture search: A survey. arXiv preprint arXiv:1808.05377, 2018.
    Findings
  • Gal, Y. and Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059, 2016.
    Google ScholarLocate open access versionFindings
  • Gal, Y., Hron, J., and Kendall, A. Concrete dropout. In Advances in Neural Information Processing Systems, pp. 3581–3590, 2017.
    Google ScholarLocate open access versionFindings
  • Gelman, A., Hwang, J., and Vehtari, A. Understanding predictive information criteria for bayesian models. Statistics and computing, 24(6):997–1016, 2014.
    Google ScholarLocate open access versionFindings
  • Germain, M., Gregor, K., Murray, I., and Larochelle, H. Made: Masked autoencoder for distribution estimation. In International Conference on Machine Learning, pp. 881–889, 2015.
    Google ScholarLocate open access versionFindings
  • Ghiasi, G., Lin, T.-Y., and Le, Q. V. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7036–7045, 2019.
    Google ScholarLocate open access versionFindings
  • Gong, X., Chang, S., Jiang, Y., and Wang, Z. AutoGAN: Neural architecture search for generative adversarial networks. In The IEEE International Conference on Computer Vision (ICCV), Oct 2019.
    Google ScholarLocate open access versionFindings
  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
    Google ScholarLocate open access versionFindings
  • Gumbel, E. J. Statistical theory of extreme values and some practical applications. NBS Applied Mathematics Series, 33, 1954.
    Google ScholarLocate open access versionFindings
  • He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Hendrycks, D. and Gimpel, K. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International Conference on Learning Representations, 2016.
    Google ScholarLocate open access versionFindings
  • Hendrycks, D., Lee, K., and Mazeika, M. Using pretraining can improve model robustness and uncertainty. arXiv preprint arXiv:1901.09960, 2019a.
    Findings
  • Hendrycks, D., Mazeika, M., and Dietterich, T. Deep anomaly detection with outlier exposure. In International Conference on Learning Representations, 2019b.
    Google ScholarLocate open access versionFindings
  • Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. Bayesian model averaging: A tutorial. Statistical science, pp. 382–401, 1999.
    Google ScholarLocate open access versionFindings
  • Jang, E., Gu, S., and Poole, B. Categorical reparameterization with Gumbel-Softmax. arXiv preprint arXiv:1611.01144, 2016.
    Findings
  • Jin, H., Song, Q., and Hu, X. Auto-keras: Efficient neural architecture search with network morphism. arXiv preprint arXiv:1806.10282, 2018.
    Findings
  • Kendall, A. and Gal, Y. What uncertainties do we need in Bayesian deep learning for computer vision? In Advances in neural information processing systems, pp. 5574–5584, 2017.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P. and Dhariwal, P. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pp. 10215–10224, 2018.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P. and Welling, M. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
    Findings
  • Kingma, D. P., Salimans, T., and Welling, M. Variational dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems, pp. 2575–2583, 2015.
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A. et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
    Google ScholarFindings
  • Lakshminarayanan, B., Pritzel, A., and Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, pp. 6402–6413, 2017.
    Google ScholarLocate open access versionFindings
  • LeCun, Y. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/.
    Findings
  • Lee, K., Lee, H., Lee, K., and Shin, J. Training confidencecalibrated classifiers for detecting out-of-distribution samples. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Liang, S., Li, Y., and Srikant, R. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690, 2017.
    Findings
  • Springer, 2018a.
    Google ScholarFindings
  • Liu, C., Chen, L.-C., Schroff, F., Adam, H., Hua, W., Yuille, A. L., and Fei-Fei, L. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 82–92, 2019.
    Google ScholarLocate open access versionFindings
  • Liu, H., Simonyan, K., and Yang, Y. DARTS: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018b.
    Findings
  • Lowe, D. G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, volume 2, pp. 1150–1157, 1999.
    Google ScholarLocate open access versionFindings
  • Maddison, C. J., Tarlow, D., and Minka, T. A* sampling. In Advances in Neural Information Processing Systems, pp. 3086–3094, 2014.
    Google ScholarLocate open access versionFindings
  • Nalisnick, E., Matsukawa, A., Teh, Y. W., Gorur, D., and Lakshminarayanan, B. Do deep generative models know what they don’t know? In International Conference on Learning Representations, 2019a.
    Google ScholarLocate open access versionFindings
  • Nalisnick, E., Matsukawa, A., Teh, Y. W., and Lakshminarayanan, B. Detecting out-of-distribution inputs to deep generative models using a test for typicality. arXiv preprint arXiv:1906.02994, 2019b.
    Findings
  • Negrinho, R. and Gordon, G. Deeparchitect: Automatically designing and training deep architectures. arXiv preprint arXiv:1704.08792, 2017.
    Findings
  • Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., and Ng, A. Y. Reading digits in natural images with unsupervised feature learning. 2011.
    Google ScholarFindings
  • Nguyen, A., Yosinski, J., and Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 427–436, 2015.
    Google ScholarLocate open access versionFindings
  • Zoph, B. and Le, Q. V. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
    Findings
  • Zoph, B., Vasudevan, V., Shlens, J., and Le, Q. V. Learning transferable architectures for scalable image recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8697–8710. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • NHTSA. Tesla crash preliminary evaluation report. Technical report, U.S. Department of Transportation, National Highway Traffic Safety Administration, Jan 2017.
    Google ScholarFindings
  • Oord, A. v. d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
    Findings
  • Pham, H., Guan, M., Zoph, B., Le, Q., and Dean, J. Efficient neural architecture search via parameter sharing. In International Conference on Machine Learning, pp. 4092–4101, 2018.
    Google ScholarLocate open access versionFindings
  • Real, E., Aggarwal, A., Huang, Y., and Le, Q. V. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 4780–4789, 2019.
    Google ScholarLocate open access versionFindings
  • Watanabe, S. A widely applicable Bayesian information criterion. Journal of Machine Learning Research, 14 (Mar):867–897, 2013.
    Google ScholarLocate open access versionFindings
  • Xie, S., Zheng, H., Liu, C., and Lin, L. SNAS: Stochastic neural architecture search. arXiv preprint arXiv:1812.09926, 2018.
    Findings
  • Yin, M., Yue, Y., and Zhou, M. Arsm: Augmentreinforce-swap-merge estimator for gradient backpropagation through categorical variables. In International Conference on Machine Learning, pp. 7095–7104, 2019.
    Google ScholarLocate open access versionFindings
  • Zhong, Z., Yan, J., Wu, W., Shao, J., and Liu, C.-L. Practical block-wise neural network architecture generation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2423–2432. IEEE, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments