## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples

NIPS 2020, (2020)

EI

Abstract

Among existing uncertainty estimation approaches, Dirichlet Prior Network (DPN) distinctly models different predictive uncertainty types. However, for in-domain examples with high data uncertainties among multiple classes, even a DPN model often produces indistinguishable representations from the out-of-distribution (OOD) examples, comp...More

Code:

Data:

Introduction

- Deep neural network (DNN) based models have achieved impeccable success to address various real-world tasks [1, 2, 3].
- When these intelligent systems fail, they do not provide any explanation or warning.
- Distributional uncertainty or dataset shift arises due to the distributional mismatch between the training and test examples, that is, the test data is out-of-distribution (OOD) [6, 5]

Highlights

- Deep neural network (DNN) based models have achieved impeccable success to address various real-world tasks [1, 2, 3]
- Predictive uncertainty estimation has emerged as an important research direction to inform users about possible wrong predictions and allow users to react in an informative manner, improving their reliability
- The existing formulation for Dirichlet Prior Network (DPN) models often lead to indistinguishable representations between in-domain examples with high data uncertainty among multiple classes and OOD examples
- We have proposed a novel loss function for DPN models that maximizes the representation gap between in-domain and OOD examples
- Experiments on benchmark datasets demonstrate that our proposed approach effectively distinguishes the distributional uncertainty from other uncertainty types and outperforms the existing OOD detection models
- We address a shortcoming of the existing techniques and propose a novel solution to improve the detection of anomalous out-of-distribution examples for a classification model

Methods

- Max.P Ent. D-Ent Max.P Ent. D-Ent

Results

- The authors visualize the uncertainty measures of different data points for DPNrev along with the DPN+ and DPN− models in Figure 7(b), 7(c) and 7(d) respectively.
- The authors present the results for entropy of categorical posterior distributions, H[p(ωc|x∗, D)], for different data points.
- This is a total uncertainty measure as it is derived from the expected predictive categorical distribution, p(ωc|x∗, D) i.e by marginalizing μ and θ in Eq 2.

Conclusion

- The existing formulation for DPN models often lead to indistinguishable representations between in-domain examples with high data uncertainty among multiple classes and OOD examples.
- Despite the impeccable success of deep neural network (DNN)-based models in various real-world applications, they often produce incorrect predictions without proving any warning for the users.
- It raises the question of how much can the authors trust these models and whether it is safe to use them for sensitive real-world applications such as medical diagnosis, self-driving cars or to make financial decisions.

Summary

## Introduction:

Deep neural network (DNN) based models have achieved impeccable success to address various real-world tasks [1, 2, 3].- When these intelligent systems fail, they do not provide any explanation or warning.
- Distributional uncertainty or dataset shift arises due to the distributional mismatch between the training and test examples, that is, the test data is out-of-distribution (OOD) [6, 5]
## Objectives:

The authors aim to robustly identify the source of uncertainty in the prediction of a DNN based model for classification tasks.## Methods:

Max.P Ent. D-Ent Max.P Ent. D-Ent## Results:

The authors visualize the uncertainty measures of different data points for DPNrev along with the DPN+ and DPN− models in Figure 7(b), 7(c) and 7(d) respectively.- The authors present the results for entropy of categorical posterior distributions, H[p(ωc|x∗, D)], for different data points.
- This is a total uncertainty measure as it is derived from the expected predictive categorical distribution, p(ωc|x∗, D) i.e by marginalizing μ and θ in Eq 2.
## Conclusion:

The existing formulation for DPN models often lead to indistinguishable representations between in-domain examples with high data uncertainty among multiple classes and OOD examples.- Despite the impeccable success of deep neural network (DNN)-based models in various real-world applications, they often produce incorrect predictions without proving any warning for the users.
- It raises the question of how much can the authors trust these models and whether it is safe to use them for sensitive real-world applications such as medical diagnosis, self-driving cars or to make financial decisions.

- Table1: AUROC scores for OOD detection (mean ± standard deviation of 3 runs). Refer to Table
- Table2: AUROC scores for misclassified image detection. Refer to Table 9 (Appendix) for additional results by using AUPR scores along with in-domain classification accuracy
- Table3: KL-divergence scores from the distribution of uncertainty values of missclassified and correctly predicted examples to the OOD examples. See Table 10 (Appendix) for additional results
- Table4: AUROC scores for OOD image detection of our DPN− models using different values of λin and λout for C100 classification task. We report (mean ± standard deviation) values of three runs
- Table5: AUROC scores OOD image detection results for DPN models using RKL loss function [<a class="ref-link" id="c16" href="#r16">16</a>] with different choices of hyper-parameters for C100 classification task. We report (mean ± standard deviation) values of three runs
- Table6: OOD image detection results of the binary classifiers compare to our DPN− models for C10 and C100 classification task. We report (mean ± standard deviation) values of three runs
- Table7: Details of Train and Test Datasets used for the different classifiers
- Table8: Root mean square (RMS)
- Table9: Classification accuracy and misclassified image detection. Here, we report the (mean ± standard deviation) of 3 runs for each framework. Note that AUPR may not be an ideal metric for comparison, as it depends on the base rates i.e the number of misclassified examples v.s correctly classified predictions. That is, AUPR scores are comparable when the models achieve similar classification accuracy. Our DPN− models achieve comparable performance foe misclassified image detection using the AUROC metric
- Table10: KL-divergence scores from the distribution of uncertainty values of missclassified and correctly predicted examples to the OOD examples. Higher scores are desirable as it indicates greater gap between in-domain and OOD examples. We report the (mean ± standard deviation) of 3 runs for each frameworks
- Table11: Results of OOD detection for C10. We report (mean ± standard deviation) of three different models. Description of these OOD datasets are provided in Appendix B.2.1
- Table12: Results of OOD image detection for C100. We report (mean ± standard deviation) of three different models. Description of these OOD datasets are provided in Appendix B.2.1
- Table13: Results of OOD image detection for TIM. We report (mean ± standard deviation) of three different models. Description of these OOD datasets are provided in Appendix B.2.1

Related work

- In the Bayesian neural network, the predictive uncertainty of a classification model is expressed in terms of data and model uncertainty [4]. Let Din = {xi, yi}Ni=1 ∼ Pin(x, y) where x and y denotes the images and their corresponding class-labels, sampled from an underlying probability distribution Pin(x, y). Given an input x∗, the data uncertainty, p(ωc|x∗, θ) is the posterior distribution over class labels given the model parameters θ, while the model uncertainty, p(θ|Din) is the posterior distribution over parameters given the data, Din. Hence, the predictive uncertainty is given as: p(ωc|x∗, Din) = p(ωc|x∗, θ) p(θ|Din) dθ (1)

where ωc is the representation for class c. We use the standard abbreviation for p(y = ωc|x∗, Din) as p(ωc|x∗, Din).

However, the true posterior of p(θ|Din) is intractable. Hence, we need approximation such as Monte-

Funding

- This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG-GC-2019-001)

Study subjects and analysis

requires training samples: 12

However, the choice of λin = 0 and λout < 0 does not enforce these properties (see ablation study in Appendix A.1). The overall loss function in Eqn 12 requires training samples from both in-domain distribution and OOD. Here, we select a different real-world dataset as our OOD training examples

Reference

- Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine, 29, 2012.
- Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning in medical image analysis. Medical image analysis, 2017.
- Yarin Gal. Uncertainty in deep learning. PhD thesis, PhD thesis, University of Cambridge, 2016.
- Andrey Malinin and Mark Gales. Predictive uncertainty estimation via prior networks. In NeurIPS, 2018.
- JQ Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. Dataset shift in machine learning. The MIT Press, 2009.
- Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, et al. End to end learning for self-driving cars. arXiv preprint, 2016.
- Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
- Marcos Lopez De Prado. Advances in financial machine learning. John Wiley & Sons, 2018.
- Jose Miguel Hernandez-Lobato and Ryan Adams. Probabilistic backpropagation for scalable learning of bayesian neural networks. In ICML, 2015.
- Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016.
- Christos Louizos and Max Welling. Multiplicative normalizing flows for variational bayesian neural networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 2218–2227. JMLR. org, 2017.
- Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In NeurIPS, 2017.
- Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. Training confidence-calibrated classifiers for detecting out-of-distribution samples. In ICLR, 2018.
- Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure. In ICLR, 2019.
- Andrey Malinin and Mark Gales. Reverse kl-divergence training of prior networks: Improved uncertainty and adversarial robustness. In NeurIPS, 2019.
- Max Welling and Yee Whye Teh. Bayesian learning via stochastic gradient langevin dynamics. In ICML, 2011.
- Murat Sensoy, Lance Kaplan, and Melih Kandemir. Evidential deep learning to quantify classification uncertainty. In NeurIPS, 2018.
- Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv, 2014.
- Shiyu Liang, Yixuan Li, and R. Srikant. Enhancing the reliability of out-of-distribution image detection in neural networks. In ICLR, 2018.
- Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In NeurIPS, 2018.
- Terrance DeVries and Graham W Taylor. Learning confidence for out-of-distribution detection in neural networks. arXiv, 2018.
- Gabi Shalev, Yossi Adi, and Joseph Keshet. Out-of-distribution detection using multiple semantic label representations. In NeurIPS, 2018.
- Matthias Hein, Maksym Andriushchenko, and Julian Bitterwolf. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In CVPR, 2019.
- Alexander Meinke and Matthias Hein. Towards neural networks that provably know when they don’t know. In International Conference on Learning Representations, 2020.
- Julian Bitterwolf, Alexander Meinke, and Matthias Hein. Provable worst case guarantees for the detection of out-of-distribution data. arXiv preprint, 2020.
- Andrey Malinin. Uncertainty estimation in deep learning with application to spoken language assessment. In Doctoral thesis, 2019.
- Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Citeseer, 2009.
- Fei-Fei Li, Andrej Karpathy, and Justin Johnson. Tiny imagenet visual recognition challenge. Stanford University CS231N, 2017.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition CVPR, 2009.
- Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. ICLR, 2017.
- Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In AISTATS, 2011.
- Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv, 2015.
- M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed,, and A. Vedaldi. Describing textures in the wild. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition CVPR, 2014.
- Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. 2017.
- John A Swets. Signal detection theory and ROC analysis in psychology and diagnostics: Collected papers. Psychology Press, 2014.
- Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
- Khanh Nguyen and Brendan O’Connor. Posterior calibration and exploratory analysis for natural language processing models. 2015.
- 2. Section B.1 provides the implementation details for our experiments on synthetic datasets. Section B.2 provides the experimental setup, implementation details of our models, and competitive models along with the description of the OOD test datasets for our experiments on the benchmark image classification datasets. Section B.3 presents a comparative study for confidence calibration performance of different models.
- 3. The expressions for differential entropy, mutual information of a Dirichlet distribution, and the KL Divergence between two Gaussian distributions are provided in Section C.
- 4. The extended results (mean ± standard deviation of 3 models) for the benchmark image classification datasets are provided in Section D from Table 9 to Table 13.

Tags

Comments