AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We introduced the accuracy versus uncertainty calibration loss and proposed novel optimization methods AvUC and AvU temperature scaling for improving uncertainty calibration in deep neural networks

Improving model calibration with accuracy versus uncertainty optimization

NIPS 2020, (2020)

被引用0|浏览26
EI
下载 PDF 全文
引用
微博一下

摘要

Obtaining reliable and accurate quantification of uncertainty estimates from deep neural networks is important in safety-critical applications. A well-calibrated model should be accurate when it is certain about its prediction and indicate high uncertainty when it is likely to be inaccurate. Uncertainty calibration is a challenging prob...更多

代码

数据

0
简介
  • Probabilistic deep neural networks (DNNs) enable quantification of principled uncertainty estimates, which are essential to understand the model predictions for reliable decision making in safety critical applications [1].
  • Approximate Bayesian inference methods are promising, but they may fail to provide calibrated uncertainty in between separated regions of observations as they tend to fit an approximation to a local mode and do not capture the complete true posterior [9, 15, 16, 32]
  • This may cause the model to be overconfident under distributional shift.
  • Existing calibration methods do not explicitly account for the quality of predictive uncertainty estimates while training the model, or post-hoc calibration
重点内容
  • Probabilistic deep neural networks (DNNs) enable quantification of principled uncertainty estimates, which are essential to understand the model predictions for reliable decision making in safety critical applications [1]
  • We compare the proposed methods with various high performing non-Bayesian and Bayesian methods including vanilla DNN (Vanilla), Temperature scaling (Temp scaling) [11], Deep-ensembles (Ensemble) [9], Monte Carlo dropout (Dropout) [5], Mean-field stochastic variational inference (SVI) [2, 3], Temperature scaling on SVI (SVI-TS) and Radial Bayesian neural network (Radial BNN) [8]
  • In addition to SVI-accuracy versus uncertainty calibration (AvUC) and SVI-AvU temperature scaling (AvUTS), we evaluate AvUC and AvUTS methods applied to vanilla baseline with entropy of softmax used as the predictive uncertainty in computing AvUC loss, which is combined with the cross-entropy loss
  • We introduced the accuracy versus uncertainty calibration (AvUC) loss and proposed novel optimization methods AvUC and AvUTS for improving uncertainty calibration in deep neural networks
  • Uncertainty calibration is important for reliable and informed decision making in safety critical applications, we envision AvUC as a step towards advancing probabilistic deep neural networks in providing well-calibrated uncertainties along with improved accuracy
  • We demonstrated our method SVI-AvUC provides better model calibration than existing state-of-the-art methods under distributional shift
方法
  • ECE (%)↓ at various datashift intensities

    UCE (%)↓ at various datashift intensities

    Vanilla Vanilla-AvuTS Vanilla-AvUC

    In addition to SVI-AvUC and SVI-AvUTS, the authors evaluate AvUC and AvUTS methods applied to vanilla baseline with entropy of softmax used as the predictive uncertainty in computing AvUC loss, which is combined with the cross-entropy loss.
  • Table 1 shows AvUTS and AvUC improves the model calibration errors (ECE and UCE) on the vanilla baseline as well.
  • Figures 2(d), (e), (f) shows SVI-AvUC is more uncertain when making inaccurate predictions under distributional shift, compared to other methods.
  • Figures 2(g) and (h) show SVI-AvUC has lesser number of examples with higher confidence when model accuracy is low under distributional shift.
  • SVI-AvUC outperforms other methods in providing calibrated confidence and uncertainty measures under distributional shift
结果
  • The authors perform a thorough empirical evaluation of the proposed methods SVI-AvUC and SVI-AvUTS on large-scale image classification task under distributional shift.
  • The results for the methods: Vanilla, Temp scaling, Ensemble, Dropout, LL Dropout and LL SVI are obtained from the model predictions provided in UQ benchmark [26] and the authors follow the same methodology for model evaluation under distributional shift by utilizing 16 different types of image corruptions at 5 different levels of intensities for each datashift type proposed in [20], resulting in 80 variations of test data for datashift evaluation.
  • The authors provide details of the model implementations and hyperparameters for SVI, SVI-TS, SVI-AvUC, SVI-AvUTS and Radial BNN in Appendix B
结论
  • The authors introduced the accuracy versus uncertainty calibration (AvUC) loss and proposed novel optimization methods AvUC and AvUTS for improving uncertainty calibration in deep neural networks.
  • Uncertainty calibration is important for reliable and informed decision making in safety critical applications, the authors envision AvUC as a step towards advancing probabilistic deep neural networks in providing well-calibrated uncertainties along with improved accuracy.
  • The authors demonstrated the method SVI-AvUC provides better model calibration than existing state-of-the-art methods under distributional shift.
  • The authors have made the code 3 available to facilitate probabilistic deep learning community to evaluate and improve model calibration for various other baselines
表格
  • Table1: Additional results evaluating AvUC and AvUTS methods applied to Vanilla baseline on CIFAR10. Vanilla-AvUTS and Vanilla-AvUC provides lower ECE and UCE (mean across 16 different data shift types) compared to the baseline
  • Table2: Distributional shift detection using predictive uncertainty. For dataset shift detection on ImageNet and CIFAR10, we use test data shifted with Gaussian blur of intensity 5. SVHN is used as out-of-distribution(OOD) data for OOD detection on model trained with CIFAR10. All values are in percentages and best results are indicated in bold. SVI-AvUC outperforms across all the metrics
Download tables as Excel
引用论文
  • Zoubin Ghahramani. Probabilistic machine learning and artificial intelligence. Nature, 521(7553):452–459, 2015.
    Google ScholarLocate open access versionFindings
  • Alex Graves. Practical variational inference for neural networks. In Advances in neural information processing systems, pages 2348–2356, 2011.
    Google ScholarLocate open access versionFindings
  • Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural network. In International Conference on Machine Learning, pages 1613–1622, 2015.
    Google ScholarLocate open access versionFindings
  • Durk P Kingma, Tim Salimans, and Max Welling. Variational dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems, pages 2575–2583, 2015.
    Google ScholarLocate open access versionFindings
  • Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059, 2016.
    Google ScholarLocate open access versionFindings
  • Wesley J Maddox, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and Andrew Gordon Wilson. A simple baseline for bayesian uncertainty in deep learning. In Advances in Neural Information Processing Systems, pages 13132–13143, 2019.
    Google ScholarLocate open access versionFindings
  • Raanan Yehezkel Rohekar, Yaniv Gurwicz, Shami Nisimov, and Gal Novik. Modeling uncertainty by learning a hierarchy of deep neural connections. In Advances in Neural Information Processing Systems, pages 4246–4256, 2019.
    Google ScholarLocate open access versionFindings
  • Sebastian Farquhar, Michael Osborne, and Yarin Gal. Radial bayesian neural networks: Beyond discrete support in large-scale bayesian deep learning. Proceedings of the 23rtd International Conference on Artificial Intelligence and Statistics, 2020.
    Google ScholarLocate open access versionFindings
  • Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems, pages 6402–6413, 2017.
    Google ScholarLocate open access versionFindings
  • Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-ofdistribution samples and adversarial attacks. In Advances in Neural Information Processing Systems, pages 7167–7177, 2018.
    Google ScholarLocate open access versionFindings
  • Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1321–1330. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Aviral Kumar, Sunita Sarawagi, and Ujjwal Jain. Trainable calibration measures for neural networks from kernel mean embeddings. In International Conference on Machine Learning, pages 2805–2814, 2018.
    Google ScholarLocate open access versionFindings
  • Jishnu Mukhoti, Viveka Kulharia, Amartya Sanyal, Stuart Golodetz, Philip HS Torr, and Puneet K Dokania. Calibrating deep neural networks using focal loss. arXiv preprint arXiv:2002.09437, 2020.
    Findings
  • Volodymyr Kuleshov, Nathan Fenner, and Stefano Ermon. Accurate uncertainties for deep learning using calibrated regression. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2796–2804. PMLR, 2018.
    Google ScholarLocate open access versionFindings
  • Andrew YK Foong, Yingzhen Li, José Miguel Hernández-Lobato, and Richard E Turner. ’inbetween’uncertainty in bayesian neural networks. arXiv preprint arXiv:1906.11537, 2019.
    Findings
  • Jonathan Heek. Well-calibrated bayesian neural networks. University of Cambridge, 2018.
    Google ScholarFindings
  • Ananya Kumar, Percy S Liang, and Tengyu Ma. Verified uncertainty calibration. In Advances in Neural Information Processing Systems, pages 3787–3798, 2019.
    Google ScholarLocate open access versionFindings
  • Meelis Kull, Miquel Perello Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, and Peter Flach. Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration. In Advances in Neural Information Processing Systems, pages 12295–12305, 2019.
    Google ScholarLocate open access versionFindings
  • Sunil Thulasidasan, Gopinath Chennupati, Jeff A Bilmes, Tanmoy Bhattacharya, and Sarah Michalak. On mixup training: Improved calibration and predictive uncertainty for deep neural networks. In Advances in Neural Information Processing Systems, pages 13888–13899, 2019.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. Proceedings of the International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks, Norman Mu, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. Augmix: A simple method to improve robustness and uncertainty under data shift. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Jose G Moreno-Torres, Troy Raeder, RocíO Alaiz-RodríGuez, Nitesh V Chawla, and Francisco Herrera. A unifying view on dataset shift in classification. Pattern recognition, 45(1):521–530, 2012.
    Google ScholarLocate open access versionFindings
  • Michael A Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei-Shinn Ku, and Anh Nguyen. Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4845–4854, 2019.
    Google ScholarLocate open access versionFindings
  • Hermann Blum, Paul-Edouard Sarlin, Juan Nieto, Roland Siegwart, and Cesar Cadena. The fishyscapes benchmark: measuring blind spots in semantic segmentation. arXiv preprint arXiv:1904.03215, 2019.
    Findings
  • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.
    Findings
  • Jasper Snoek, Yaniv Ovadia, Emily Fertig, Balaji Lakshminarayanan, Sebastian Nowozin, D Sculley, Joshua Dillon, Jie Ren, and Zachary Nado. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems, pages 13969–13980, 2019.
    Google ScholarLocate open access versionFindings
  • Simon Lacoste-Julien, Ferenc Huszár, and Zoubin Ghahramani. Approximate inference for the losscalibrated bayesian. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 416–424, 2011.
    Google ScholarLocate open access versionFindings
  • Adam D Cobb, Stephen J Roberts, and Yarin Gal. Loss-calibrated approximate inference in bayesian neural networks. arXiv preprint arXiv:1805.03901, 2018.
    Findings
  • James O Berger. Statistical Decision Theory and Bayesian Analysis. Springer, 1985.
    Google ScholarFindings
  • Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688, 2011.
    Google ScholarLocate open access versionFindings
  • Tianqi Chen, Emily Fox, and Carlos Guestrin. Stochastic gradient hamiltonian monte carlo. In International conference on machine learning, pages 1683–1691, 2014.
    Google ScholarLocate open access versionFindings
  • Lewis Smith and Yarin Gal. Understanding measures of uncertainty for adversarial example detection. arXiv preprint arXiv:1803.08533, 2018.
    Findings
  • Armen Der Kiureghian and Ove Ditlevsen. Aleatory or epistemic? does it matter? Structural safety, 31(2): 105–112, 2009.
    Google ScholarLocate open access versionFindings
  • Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? In Advances in neural information processing systems, pages 5574–5584, 2017.
    Google ScholarLocate open access versionFindings
  • Yarin Gal. Uncertainty in deep learning. PhD thesis, University of Cambridge, 2016.
    Google ScholarFindings
  • Claude E Shannon. A mathematical theory of communication. Bell system technical journal, 27(3): 379–423, 1948.
    Google ScholarLocate open access versionFindings
  • Linton G Freeman. Elementary applied statistics. John Wiley and Sons, 1965.
    Google ScholarFindings
  • Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. Bayesian active learning for classification and preference learning. arXiv preprint arXiv:1112.5745, 2011.
    Findings
  • Jishnu Mukhoti and Yarin Gal. Evaluating bayesian deep learning methods for semantic segmentation. arXiv preprint arXiv:1811.12709, 2018.
    Findings
  • Mahdi Pakdaman Naeini, Gregory Cooper, and Milos Hauskrecht. Obtaining well calibrated probabilities using bayesian binning. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
    Google ScholarLocate open access versionFindings
  • Max-Heinrich Laves, Sontje Ihler, Karl-Philipp Kortmann, and Tobias Ortmaier. Well-calibrated model uncertainty with temperature scaling for dropout variational inference. arXiv preprint arXiv:1909.13550, 2019.
    Findings
  • Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007.
    Google ScholarLocate open access versionFindings
  • Glenn W Brier. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1): 1–3, 1950.
    Google ScholarLocate open access versionFindings
  • Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006.
    Google ScholarLocate open access versionFindings
  • Takaya Saito and Marc Rehmsmeier. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10(3), 2015.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.
    Google ScholarFindings
  • Carlos Riquelme, George Tucker, and Jasper Snoek. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Mahesh Subedar, Ranganath Krishnan, Paulo Lopez Meyer, Omesh Tickoo, and Jonathan Huang. Uncertainty-aware audiovisual activity recognition using deep bayesian variational inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • Ranganath Krishnan, Mahesh Subedar, and Omesh Tickoo. Specifying weight priors in bayesian deep neural networks with empirical bayes. Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020.
    Google ScholarLocate open access versionFindings
  • Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.
    Google ScholarFindings
  • Aaditya Ramdas, Nicolás García Trillos, and Marco Cuturi. On wasserstein two-sample testing and related families of nonparametric tests. Entropy, 19(2):47, 2017.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems, pages 8026–8037, 2019.
    Google ScholarLocate open access versionFindings
  • Stephen Kokoska and Daniel Zwillinger. CRC standard probability and statistics tables and formulae. Crc Press, 2000.
    Google ScholarFindings
作者
Ranganath Krishnan
Ranganath Krishnan
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科