AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
To provide insights into prior results, part of our discussion has focused on an in-depth exploration of the popular class of normalizing flows based on affine coupling layers

Why Normalizing Flows Fail to Detect Out-of-Distribution Data

NIPS 2020, (2020): 20578-20589

Cited: 41|Views338
EI
Full Text
Bibtex
Weibo

Abstract

Detecting out-of-distribution (OOD) data is crucial for robust machine learning systems. Normalizing flows are flexible deep generative models that often surprisingly fail to distinguish between in- and out-of-distribution data: a flow trained on pictures of clothing assigns higher likelihood to handwritten digits. We investigate why no...More
0
Introduction
  • Normalizing flows [39, 9, 10] seem to be ideal candidates for out-of-distribution detection, since they are simple generative models that provide an exact likelihood.
  • In Figure 1(b, c), the authors show that the coupling layers of RealNVP transform the in-distribution ImageNet in the same way as the OOD CelebA.
  • The authors show that by changing the architectural details of the coupling layers, the authors can encourage flows to learn transformations specific to the target data, improving OOD detection.
Highlights
  • Normalizing flows [39, 9, 10] seem to be ideal candidates for out-of-distribution detection, since they are simple generative models that provide an exact likelihood
  • We briefly introduce normalizing flows based on coupling layers
  • In performing OOD detection, the biases of normalizing flows can be more of a curse than a blessing
  • We have shown that flows tend to learn representations that achieve high likelihood through generic graphical features and local pixel correlations, rather than discovering semantic structure that would be specific to the training distribution
  • To provide insights into prior results [e.g., 27, 7, 28, 37, 45, 36], part of our discussion has focused on an in-depth exploration of the popular class of normalizing flows based on affine coupling layers
  • We hypothesize that many of our conclusions about coupling layers extend at a high level to other types of normalizing flows [e.g., 3, 6, 12, 20, 13, 30, 38, 17, 8]
Results
  • The authors show that OOD detection is improved when flows are trained on high-level features which contain semantic information extracted from image datasets.
  • Recent works have shown that normalizing flows, among other deep generative models, can assign higher likelihood to out-of-distribution data [27, 7].
  • Song et al [37] showed that normalizing flows with batch normalization layers in train mode assign much lower likelihood to out-of-distribution images than they do in evaluation mode, while for in-distribution data the difference is not significant.
  • 6.1 Leveraging local pixel correlations In Figure 3(a, b), the authors visualize intermediate coupling layer activations of a small RealNVP model with 2 coupling layers and checkerboard masks trained on FashionMNIST.
  • In Figure 3(c, d) the authors visualize the coupling layers for a 3-layer RealNVP with horizontal masks on in-distribution (FashionMNIST) and OOD (MNIST) data.
  • Simpler images (e.g. SVHN compared to CIFAR-10) and background often contain large patches of the same color, which makes it easy to predict masked pixels from their neighbours and to encode and decode the information via coupling layer co-adaptation.
  • The authors' observations in Sections 5 and 6 suggest that normalizing flows are biased towards learning transformations that increase likelihood simultaneously for all structured images.
  • For the checkerboard mask, the flow assigns higher likelihood to the simpler OOD datasets (SVHN for CelebA and MNIST for FashionMNIST).
  • While the proposed modifications do not completely resolve the issue of OOD data having higher likelihood, the experiments support the observations in Section 6: preventing the flows from leveraging local color correlations and coupling layer co-adaptation, the authors improve the relative likelihood ranking for in-distribution data.
  • 8 Out-of-distribution detection using image embeddings In Section 4 the authors argued that in order to detect OOD data the model has to assign likelihood based on high-level semantic features of the data, which the flows fail to do when trained on images.
Conclusion
  • Normalizing flows can detect OOD images when trained on high-level semantic representations instead of raw pixels.
  • The authors have shown that flows tend to learn representations that achieve high likelihood through generic graphical features and local pixel correlations, rather than discovering semantic structure that would be specific to the training distribution.
  • A full study of these other types of flows is a promising direction for future work
Tables
  • Table1: Baseline AUROC. AUROC scores on OOD detection for RealNVP and Glow models trained on various image data. Flows consistently assign higher likelihoods to OOD dataset except when trained on MNIST and SVHN. The AUROC scores for RealNVP and Glow are close
  • Table2: Image embedding and UCI AUROC. (a): AUROC scores on OOD detection for RealNVP model trained on image embeddings extracted from EfficientNet. The model is trained on one of the embedding datasets while the remaining two are considered OOD. The models consistenly assign higher likelihood to in-distribution data, and in particular AUROC scores are significantly better compared to flows trained on the original images (see Table 1). (b): AUROC scores on OOD detection for RealNVP trained on one class of Hepmass and Miniboone datasets while the other class is treated as OOD data
Download tables as Excel
Related work
  • Recent works have shown that normalizing flows, among other deep generative models, can assign higher likelihood to out-of-distribution data [27, 7]. The work on OOD detection with deep generative models falls into two distinct categories. In group anomaly detection (GAD), the task is to label a batch of n > 1 datapoints as in- or out-of-distribution. Point anomaly detection (PAD) involves the more challenging task of labelling single points as out-of-distribution.

    Group anomaly detection Nalisnick et al [28] introduce the typicality test which distinguishes between a high density set and a typical set of a distribution induced by a model. However, the typicality test cannot detect OOD data if the flow assigns it with a similar likelihood distribution to that of in-distribution data. Song et al [37] showed that out-of-distribution datasets have lower likelihoods when batch normalization statistics are computed from a current batch instead of accumulated over the train set, and proposed a test based on this observation. Zhang et al [45] introduce a GAD algorithm based on measuring correlations of flow’s latent representations corresponding to the input batch. The main limitation of GAD methods is that for most practical applications the assumption that the data comes in batches of inputs that are all in-distribution or all OOD is not realistic.
Funding
  • PK, PI, and AGW are supported by an Amazon Research Award, Amazon Machine Learning Research Award, Facebook Research, NSF I-DISRE 193471, NIH R01 DA048764-01A1, NSF IIS-1910266, and NSF 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science
Reference
  • Andrei Atanov, Alexandra Volokhova, Arsenii Ashukha, Ivan Sosnovik, and Dmitry Vetrov. Semi-conditional normalizing flows for semi-supervised learning. arXiv preprint arXiv:1905.00505, 2019.
    Findings
  • Pierre Baldi, Kyle Cranmer, Taylor Faucett, Peter Sadowski, and Daniel Whiteson. Parameterized machine learning for high-energy physics. arXiv preprint arXiv:1601.07913, 2016.
    Findings
  • Jens Behrmann, David Duvenaud, and Jörn-Henrik Jacobsen. Invertible residual networks. arXiv preprint arXiv:1811.00995, 2018.
    Findings
  • Apratim Bhattacharyya, Shweta Mahajan, Mario Fritz, Bernt Schiele, and Stefan Roth. Normalizing flows with multi-scale autoregressive priors. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 8415–8424, 2020.
    Google ScholarLocate open access versionFindings
  • Jianfei Chen, Cheng Lu, Biqi Chenli, Jun Zhu, and Tian Tian. Vflow: More expressive generative flows with variational data augmentation. arXiv preprint arXiv:2002.09741, 2020.
    Findings
  • Ricky TQ Chen, Jens Behrmann, David Duvenaud, and Jörn-Henrik Jacobsen. Residual flows for invertible generative modeling. arXiv preprint arXiv:1906.02735, 2019.
    Findings
  • Hyunsun Choi, Eric Jang, and Alexander A Alemi. Waic, but why? generative ensembles for robust anomaly detection. arXiv preprint arXiv:1810.01392, 2018.
    Findings
  • Nicola De Cao, Ivan Titov, and Wilker Aziz. Block neural autoregressive flow. arXiv preprint arXiv:1904.04676, 2019.
    Findings
  • Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
    Findings
  • Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP. arXiv preprint arXiv:1605.08803, 2016.
    Findings
  • Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline flows. In Advances in Neural Information Processing Systems, pages 7509–7520, 2019.
    Google ScholarLocate open access versionFindings
  • Marc Finzi, Pavel Izmailov, Wesley Maddox, Polina Kirichenko, and Andrew Gordon Wilson. Invertible convolutional networks. In Workshop on Invertible Neural Nets and Normalizing Flows, International Conference on Machine Learning, 2019.
    Google ScholarLocate open access versionFindings
  • Will Grathwohl, Ricky TQ Chen, Jesse Betterncourt, Ilya Sutskever, and David Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367, 2018.
    Findings
  • Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606, 2018.
    Findings
  • Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. Flow++: Improving flow-based generative models with variational dequantization and architecture design. arXiv preprint arXiv:1902.00275, 2019.
    Findings
  • Emiel Hoogeboom, Rianne van den Berg, and Max Welling. Emerging convolutions for generative normalizing flows. arXiv preprint arXiv:1901.11137, 2019.
    Findings
  • Chin-Wei Huang, David Krueger, Alexandre Lacoste, and Aaron Courville. Neural autoregressive flows. arXiv preprint arXiv:1804.00779, 2018.
    Findings
  • Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
    Findings
  • Pavel Izmailov, Polina Kirichenko, Marc Finzi, and Andrew Gordon Wilson. Semisupervised learning with normalizing flows. In International Conference on Machine Learning, 2020.
    Google ScholarLocate open access versionFindings
  • Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, and Daniel Duckworth. Invertible convolutional flow. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 5635–5645. Curran Associates, Inc., 2019. URL http://papers.nips.cc/paper/8801-invertible-convolutional-flow.pdf.
    Locate open access versionFindings
  • Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, and Sungroh Yoon. Flowavenet: A generative flow for raw audio. arXiv preprint arXiv:1811.02155, 2018.
    Findings
  • Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1×1 convolutions. In Advances in Neural Information Processing Systems, pages 10215–10224, 2018.
    Google ScholarLocate open access versionFindings
  • Ivan Kobyzev, Simon Prince, and Marcus A Brubaker. Normalizing flows: Introduction and ideas. arXiv preprint arXiv:1908.09257, 2019.
    Findings
  • Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
    Findings
  • Xuezhe Ma, Xiang Kong, Shanghang Zhang, and Eduard Hovy. Macow: Masked convolutional generative flow. In Advances in Neural Information Processing Systems, pages 5891–5900, 2019.
    Google ScholarLocate open access versionFindings
  • Tom M Mitchell. The need for biases in learning generalizations. Department of Computer Science, Laboratory for Computer Science Research..., 1980.
    Google ScholarFindings
  • Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshminarayanan. Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136, 2018.
    Findings
  • Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, and Balaji Lakshminarayanan. Detecting out-of-distribution inputs to deep generative models using a test for typicality. arXiv preprint arXiv:1906.02994, 2019.
    Findings
  • Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759, 2016.
    Findings
  • George Papamakarios, Theo Pavlakou, and Iain Murray. Masked autoregressive flow for density estimation. In Advances in Neural Information Processing Systems, pages 2338–2347, 2017.
    Google ScholarLocate open access versionFindings
  • George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. arXiv preprint arXiv:1912.02762, 2019.
    Findings
  • Ryan Prenger, Rafael Valle, and Bryan Catanzaro. Waveglow: A flow-based generative network for speech synthesis. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3617–3621. IEEE, 2019.
    Google ScholarLocate open access versionFindings
  • Jie Ren, Peter J Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark Depristo, Joshua Dillon, and Balaji Lakshminarayanan. Likelihood ratios for out-of-distribution detection. In Advances in Neural Information Processing Systems, pages 14680–14691, 2019.
    Google ScholarLocate open access versionFindings
  • Byron P Roe, Hai-Jun Yang, Ji Zhu, Yong Liu, Ion Stancu, and Gordon McGregor. Boosted decision trees as an alternative to artificial neural networks for particle identification. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 543(2-3):577–584, 2005.
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115 (3):211–252, 2015.
    Google ScholarLocate open access versionFindings
  • Joan Serrà, David Álvarez, Vicenç Gómez, Olga Slizovskaia, José F Núñez, and Jordi Luque. Input complexity and out-of-distribution detection with likelihood-based generative models. arXiv preprint arXiv:1909.11480, 2019.
    Findings
  • Jiaming Song, Yang Song, and Stefano Ermon. Unsupervised out-of-distribution detection with batch normalization. arXiv preprint arXiv:1910.09115, 2019.
    Findings
  • Yang Song, Chenlin Meng, and Stefano Ermon. Mintnet: Building invertible neural networks with masked convolutions. In Advances in Neural Information Processing Systems, pages 11002–11012, 2019.
    Google ScholarLocate open access versionFindings
  • Esteban G Tabak and Cristina V Turner. A family of nonparametric density estimation algorithms. Communications on Pure and Applied Mathematics, 66(2):145–164, 2013.
    Google ScholarLocate open access versionFindings
  • Mingxing Tan and Quoc V Le. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946, 2019.
    Findings
  • Lucas Theis, Aäron van den Oord, and Matthias Bethge. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844, 2015.
    Findings
  • Antonio Torralba, Rob Fergus, and William T Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE transactions on pattern analysis and machine intelligence, 30(11):1958–1970, 2008.
    Google ScholarLocate open access versionFindings
  • Benigno Uria, Iain Murray, and Hugo Larochelle. Rnade: The real-valued neural autoregressive density-estimator. In Advances in Neural Information Processing Systems, pages 2175–2183, 2013.
    Google ScholarLocate open access versionFindings
  • Andrew Gordon Wilson and Pavel Izmailov. Bayesian deep learning and a probabilistic perspective of generalization. arXiv preprint arXiv:2002.08791, 2020.
    Findings
  • Yufeng Zhang, Wanwei Liu, Zhenbang Chen, Ji Wang, Zhiming Liu, Kenli Li, Hongmei Wei, and Zuoning Chen. Out-of-distribution detection with distance guarantee in deep generative models. arXiv preprint arXiv:2002.03328, 2020.
    Findings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn