AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have proposed a novel loss function for Dirichlet Prior Network models that maximizes the representation gap between in-domain and OOD examples

Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples

NIPS 2020, (2020)

Cited by: 0|Views100
EI
Full Text
Bibtex
Weibo

Abstract

Among existing uncertainty estimation approaches, Dirichlet Prior Network (DPN) distinctly models different predictive uncertainty types. However, for in-domain examples with high data uncertainties among multiple classes, even a DPN model often produces indistinguishable representations from the out-of-distribution (OOD) examples, comp...More

Code:

Data:

0
Introduction
  • Deep neural network (DNN) based models have achieved impeccable success to address various real-world tasks [1, 2, 3].
  • When these intelligent systems fail, they do not provide any explanation or warning.
  • Distributional uncertainty or dataset shift arises due to the distributional mismatch between the training and test examples, that is, the test data is out-of-distribution (OOD) [6, 5]
Highlights
  • Deep neural network (DNN) based models have achieved impeccable success to address various real-world tasks [1, 2, 3]
  • Predictive uncertainty estimation has emerged as an important research direction to inform users about possible wrong predictions and allow users to react in an informative manner, improving their reliability
  • The existing formulation for Dirichlet Prior Network (DPN) models often lead to indistinguishable representations between in-domain examples with high data uncertainty among multiple classes and OOD examples
  • We have proposed a novel loss function for DPN models that maximizes the representation gap between in-domain and OOD examples
  • Experiments on benchmark datasets demonstrate that our proposed approach effectively distinguishes the distributional uncertainty from other uncertainty types and outperforms the existing OOD detection models
  • We address a shortcoming of the existing techniques and propose a novel solution to improve the detection of anomalous out-of-distribution examples for a classification model
Methods
  • Max.P Ent. D-Ent Max.P Ent. D-Ent
Results
  • The authors visualize the uncertainty measures of different data points for DPNrev along with the DPN+ and DPN− models in Figure 7(b), 7(c) and 7(d) respectively.
  • The authors present the results for entropy of categorical posterior distributions, H[p(ωc|x∗, D)], for different data points.
  • This is a total uncertainty measure as it is derived from the expected predictive categorical distribution, p(ωc|x∗, D) i.e by marginalizing μ and θ in Eq 2.
Conclusion
  • The existing formulation for DPN models often lead to indistinguishable representations between in-domain examples with high data uncertainty among multiple classes and OOD examples.
  • Despite the impeccable success of deep neural network (DNN)-based models in various real-world applications, they often produce incorrect predictions without proving any warning for the users.
  • It raises the question of how much can the authors trust these models and whether it is safe to use them for sensitive real-world applications such as medical diagnosis, self-driving cars or to make financial decisions.
Summary
  • Introduction:

    Deep neural network (DNN) based models have achieved impeccable success to address various real-world tasks [1, 2, 3].
  • When these intelligent systems fail, they do not provide any explanation or warning.
  • Distributional uncertainty or dataset shift arises due to the distributional mismatch between the training and test examples, that is, the test data is out-of-distribution (OOD) [6, 5]
  • Objectives:

    The authors aim to robustly identify the source of uncertainty in the prediction of a DNN based model for classification tasks.
  • Methods:

    Max.P Ent. D-Ent Max.P Ent. D-Ent
  • Results:

    The authors visualize the uncertainty measures of different data points for DPNrev along with the DPN+ and DPN− models in Figure 7(b), 7(c) and 7(d) respectively.
  • The authors present the results for entropy of categorical posterior distributions, H[p(ωc|x∗, D)], for different data points.
  • This is a total uncertainty measure as it is derived from the expected predictive categorical distribution, p(ωc|x∗, D) i.e by marginalizing μ and θ in Eq 2.
  • Conclusion:

    The existing formulation for DPN models often lead to indistinguishable representations between in-domain examples with high data uncertainty among multiple classes and OOD examples.
  • Despite the impeccable success of deep neural network (DNN)-based models in various real-world applications, they often produce incorrect predictions without proving any warning for the users.
  • It raises the question of how much can the authors trust these models and whether it is safe to use them for sensitive real-world applications such as medical diagnosis, self-driving cars or to make financial decisions.
Tables
  • Table1: AUROC scores for OOD detection (mean ± standard deviation of 3 runs). Refer to Table
  • Table2: AUROC scores for misclassified image detection. Refer to Table 9 (Appendix) for additional results by using AUPR scores along with in-domain classification accuracy
  • Table3: KL-divergence scores from the distribution of uncertainty values of missclassified and correctly predicted examples to the OOD examples. See Table 10 (Appendix) for additional results
  • Table4: AUROC scores for OOD image detection of our DPN− models using different values of λin and λout for C100 classification task. We report (mean ± standard deviation) values of three runs
  • Table5: AUROC scores OOD image detection results for DPN models using RKL loss function [<a class="ref-link" id="c16" href="#r16">16</a>] with different choices of hyper-parameters for C100 classification task. We report (mean ± standard deviation) values of three runs
  • Table6: OOD image detection results of the binary classifiers compare to our DPN− models for C10 and C100 classification task. We report (mean ± standard deviation) values of three runs
  • Table7: Details of Train and Test Datasets used for the different classifiers
  • Table8: Root mean square (RMS)
  • Table9: Classification accuracy and misclassified image detection. Here, we report the (mean ± standard deviation) of 3 runs for each framework. Note that AUPR may not be an ideal metric for comparison, as it depends on the base rates i.e the number of misclassified examples v.s correctly classified predictions. That is, AUPR scores are comparable when the models achieve similar classification accuracy. Our DPN− models achieve comparable performance foe misclassified image detection using the AUROC metric
  • Table10: KL-divergence scores from the distribution of uncertainty values of missclassified and correctly predicted examples to the OOD examples. Higher scores are desirable as it indicates greater gap between in-domain and OOD examples. We report the (mean ± standard deviation) of 3 runs for each frameworks
  • Table11: Results of OOD detection for C10. We report (mean ± standard deviation) of three different models. Description of these OOD datasets are provided in Appendix B.2.1
  • Table12: Results of OOD image detection for C100. We report (mean ± standard deviation) of three different models. Description of these OOD datasets are provided in Appendix B.2.1
  • Table13: Results of OOD image detection for TIM. We report (mean ± standard deviation) of three different models. Description of these OOD datasets are provided in Appendix B.2.1
Download tables as Excel
Related work
  • In the Bayesian neural network, the predictive uncertainty of a classification model is expressed in terms of data and model uncertainty [4]. Let Din = {xi, yi}Ni=1 ∼ Pin(x, y) where x and y denotes the images and their corresponding class-labels, sampled from an underlying probability distribution Pin(x, y). Given an input x∗, the data uncertainty, p(ωc|x∗, θ) is the posterior distribution over class labels given the model parameters θ, while the model uncertainty, p(θ|Din) is the posterior distribution over parameters given the data, Din. Hence, the predictive uncertainty is given as: p(ωc|x∗, Din) = p(ωc|x∗, θ) p(θ|Din) dθ (1)

    where ωc is the representation for class c. We use the standard abbreviation for p(y = ωc|x∗, Din) as p(ωc|x∗, Din).

    However, the true posterior of p(θ|Din) is intractable. Hence, we need approximation such as Monte-
Funding
  • This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG-GC-2019-001)
Study subjects and analysis
requires training samples: 12
However, the choice of λin = 0 and λout < 0 does not enforce these properties (see ablation study in Appendix A.1). The overall loss function in Eqn 12 requires training samples from both in-domain distribution and OOD. Here, we select a different real-world dataset as our OOD training examples

Reference
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine, 29, 2012.
    Google ScholarLocate open access versionFindings
  • Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning in medical image analysis. Medical image analysis, 2017.
    Google ScholarFindings
  • Yarin Gal. Uncertainty in deep learning. PhD thesis, PhD thesis, University of Cambridge, 2016.
    Google ScholarFindings
  • Andrey Malinin and Mark Gales. Predictive uncertainty estimation via prior networks. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • JQ Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. Dataset shift in machine learning. The MIT Press, 2009.
    Google ScholarFindings
  • Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, et al. End to end learning for self-driving cars. arXiv preprint, 2016.
    Google ScholarFindings
  • Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
    Google ScholarLocate open access versionFindings
  • Marcos Lopez De Prado. Advances in financial machine learning. John Wiley & Sons, 2018.
    Google ScholarFindings
  • Jose Miguel Hernandez-Lobato and Ryan Adams. Probabilistic backpropagation for scalable learning of bayesian neural networks. In ICML, 2015.
    Google ScholarLocate open access versionFindings
  • Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016.
    Google ScholarLocate open access versionFindings
  • Christos Louizos and Max Welling. Multiplicative normalizing flows for variational bayesian neural networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 2218–2227. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In NeurIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. Training confidence-calibrated classifiers for detecting out-of-distribution samples. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Andrey Malinin and Mark Gales. Reverse kl-divergence training of prior networks: Improved uncertainty and adversarial robustness. In NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Max Welling and Yee Whye Teh. Bayesian learning via stochastic gradient langevin dynamics. In ICML, 2011.
    Google ScholarLocate open access versionFindings
  • Murat Sensoy, Lance Kaplan, and Melih Kandemir. Evidential deep learning to quantify classification uncertainty. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv, 2014.
    Google ScholarFindings
  • Shiyu Liang, Yixuan Li, and R. Srikant. Enhancing the reliability of out-of-distribution image detection in neural networks. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Terrance DeVries and Graham W Taylor. Learning confidence for out-of-distribution detection in neural networks. arXiv, 2018.
    Google ScholarFindings
  • Gabi Shalev, Yossi Adi, and Joseph Keshet. Out-of-distribution detection using multiple semantic label representations. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Matthias Hein, Maksym Andriushchenko, and Julian Bitterwolf. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Alexander Meinke and Matthias Hein. Towards neural networks that provably know when they don’t know. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Julian Bitterwolf, Alexander Meinke, and Matthias Hein. Provable worst case guarantees for the detection of out-of-distribution data. arXiv preprint, 2020.
    Google ScholarFindings
  • Andrey Malinin. Uncertainty estimation in deep learning with application to spoken language assessment. In Doctoral thesis, 2019.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Citeseer, 2009.
    Google ScholarLocate open access versionFindings
  • Fei-Fei Li, Andrej Karpathy, and Justin Johnson. Tiny imagenet visual recognition challenge. Stanford University CS231N, 2017.
    Google ScholarFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In AISTATS, 2011.
    Google ScholarLocate open access versionFindings
  • Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv, 2015.
    Google ScholarLocate open access versionFindings
  • M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed,, and A. Vedaldi. Describing textures in the wild. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. 2017.
    Google ScholarFindings
  • John A Swets. Signal detection theory and ROC analysis in psychology and diagnostics: Collected papers. Psychology Press, 2014.
    Google ScholarFindings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
    Google ScholarLocate open access versionFindings
  • Khanh Nguyen and Brendan O’Connor. Posterior calibration and exploratory analysis for natural language processing models. 2015.
    Google ScholarFindings
  • 2. Section B.1 provides the implementation details for our experiments on synthetic datasets. Section B.2 provides the experimental setup, implementation details of our models, and competitive models along with the description of the OOD test datasets for our experiments on the benchmark image classification datasets. Section B.3 presents a comparative study for confidence calibration performance of different models.
    Google ScholarFindings
  • 3. The expressions for differential entropy, mutual information of a Dirichlet distribution, and the KL Divergence between two Gaussian distributions are provided in Section C.
    Google ScholarLocate open access versionFindings
  • 4. The extended results (mean ± standard deviation of 3 models) for the benchmark image classification datasets are provided in Section D from Table 9 to Table 13.
    Google ScholarFindings
Author
Jay Nandy
Jay Nandy
Your rating :
0

 

Tags
Comments
小科