AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We show that Parameter Ensembling by Perturbation effectively improves probabilistic predictions in terms of log-likelihood, Brier score, and expected calibration error

PEP: Parameter Ensembling by Perturbation

NIPS 2020, (2020)

被引用2|浏览56
EI
下载 PDF 全文
引用
微博一下

摘要

Ensembling is now recognized as an effective approach for increasing the predictive performance and calibration of deep networks. We introduce a new approach, Parameter Ensembling by Perturbation (PEP), that constructs an ensemble of parameter values as random perturbations of the optimal parameter set from training by a Gaussian with a...更多

代码

数据

0
简介
  • Deep neural networks have achieved remarkable success on many classification and regression tasks [28].
  • The model, in combination with the optimal parameters, is used for inference.
  • This approach ignores uncertainty in the value of the estimated parameters; as a consequence over-fitting may occur and the results of inference may be overly confident.
  • Probabilistic predictions can be characterized by their level of calibration, an empirical measure of consistency with outcomes, and work by Guo et al shows that modern neural networks (NN) are often poorly calibrated, and that a simple one-parameter temperature scaling method can improve their calibration level [12].
重点内容
  • Deep neural networks have achieved remarkable success on many classification and regression tasks [28]
  • Evaluation metrics: Model calibration was evaluated with negative log-likelihood (NLL), Brier score [3] and reliability diagrams [34]
  • NLL and Brier score are proper scoring rules that are commonly used for measuring the quality of classification uncertainty [36, 26, 8, 12]
  • We proposed Parameter Ensembling by Perturbation (PEP) for improving calibration and performance in deep learning
  • We show that PEP effectively improves probabilistic predictions in terms of log-likelihood, Brier score, and expected calibration error
  • PEP can be used as a tool to investigate the curvature properties of the likelihood landscape
方法
  • The authors describe the PEP model and analyze local properties of the resulting PEP effect.
  • The single variance parameter is chosen to maximize the likelihood of ensemble average predictions on validation data, which, empirically, has a well-defined maximum.
  • The authors begin with a standard discriminative model, e.g., a classifier that predicts a distribution on yi given an observation xi, p(yi; xi, θ) .
  • Different optimal values of θ are obtained on different data sets; the authors aim to model this variability with a very simple parametric model – an isotropic normal distribution with mean and scalar variance parameters, p(θ; θ, σ) =.
结果
  • Model calibration was evaluated with negative log-likelihood (NLL), Brier score [3] and reliability diagrams [34].
  • NLL and Brier score are proper scoring rules that are commonly used for measuring the quality of classification uncertainty [36, 26, 8, 12].
  • Expected Calibration Error (ECE) is used to summarize the results of the reliability diagram.
  • Syntax Error (413378): No current point in closepath
  • Details of evaluation metrics are given in the Supplementary Material (Appendix B).
结论
  • The authors proposed PEP for improving calibration and performance in deep learning.
  • PEP is computationally inexpensive and can be applied to any pre-trained network.
  • The authors show that PEP effectively improves probabilistic predictions in terms of log-likelihood, Brier score, and expected calibration error.
  • It nearly always provides small improvements in accuracy for pretrained ImageNet networks.
  • PEP can be used as a tool to investigate the curvature properties of the likelihood landscape
表格
  • Table1: ImageNet results: For all models except VGG19, PEP achieves statistically significant improvements in calibration compared to baseline (BL) and temperature scaling (TS), in terms of NLL and Brier score. PEP also reduces test errors, while TS does not have any effect on test errors. Although TS and PEP outperform baseline in terms of ECE% for DenseNet121, DenseNet169, ResNet, and VGG16, the improvements in ECE% is not consistent among the methods. T ∗ and σ∗ denote optimized temperature for TS and optimized sigma for PEP, respectively. Boldfaced font indicates the best results for each metric of a model and shows that the differences are statistically significant (p-value<0.05)
  • Table2: MNIST, Fashion MNIST, CIFAR-10, and CIFAR-100 results. The table summarizes experiments described in Section 3.2
Download tables as Excel
基金
  • Research reported in this publication was supported by NIH Grant No P41EB015898, Natural Sciences and Engineering Research Council (NSERC) of Canada and the Canadian Institutes of Health Research (CIHR). Training large networks can be highly compute intensive, so improved performance and calibration by ensembling approaches that use additional training, e.g., deep ensembling, can potentially cause undesirable contributions to the carbon footprint
引用论文
  • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
    Findings
  • Omar Bellprat, Sven Kotlarski, Daniel Lüthi, and Christoph Schär. Exploring perturbed physics ensembles in a regional climate model. Journal of Climate, 25(13):4582–4599, 2012.
    Google ScholarLocate open access versionFindings
  • Glenn W Brier. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1):1–3, 1950.
    Google ScholarLocate open access versionFindings
  • François Chollet et al. Keras. https://keras.io, 2015.
    Findings
  • Charles Corbière, Nicolas Thome, Avner Bar-Hen, Matthieu Cord, and Patrick Pérez. Addressing failure prediction by learning model confidence. In Advances in Neural Information Processing Systems, pages 2898–2909, 2019.
    Google ScholarLocate open access versionFindings
  • Thomas G Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pages 1–15.
    Google ScholarLocate open access versionFindings
  • Stanislav Fort, Huiyi Hu, and Balaji Lakshminarayanan. Deep ensembles: A loss landscape perspective. arXiv preprint arXiv:1912.02757, 2019.
    Findings
  • Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059, 2016.
    Google ScholarLocate open access versionFindings
  • Behrooz Ghorbani, Shankar Krishnan, and Ying Xiao. An investigation into neural net optimization via hessian eigenvalue density. arXiv preprint arXiv:1901.10159, 2019.
    Findings
  • Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
    Findings
  • Ian J Goodfellow, Oriol Vinyals, and Andrew M Saxe. Qualitatively characterizing neural network optimization problems. arXiv preprint arXiv:1412.6544, 2014.
    Findings
  • Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1321–1330. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Kaiming He, XRSSJ Zhang, S Ren, and J Sun. Deep residual learning for image recognition. eprint. arXiv preprint arXiv:0706.1234, 2015.
    Findings
  • Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In 5th International Conference on Learning Representations, ICLR 2017, 2017.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks, Kimin Lee, and Mantas Mazeika. Using pre-training can improve model robustness and uncertainty. arXiv preprint arXiv:1901.09960, 2019.
    Findings
  • Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, and Kilian Q. Weinberger. Snapshot ensembles: Train 1, get M for free. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
    Google ScholarLocate open access versionFindings
  • Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
    Google ScholarLocate open access versionFindings
  • Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pages 448–456, 2015.
    Google ScholarLocate open access versionFindings
  • Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407, 2018.
    Findings
  • Ahmadreza Jeddi, Mohammad Javad Shafiee, Michelle Karg, Christian Scharfenberger, and Alexander Wong. Learn2perturb: an end-to-end feature perturbation learning to improve adversarial robustness. arXiv preprint arXiv:2003.01090, 2020.
    Findings
  • Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, and Akash Srivastava. Fast and scalable bayesian deep learning by weight-perturbation in adam. arXiv preprint arXiv:1806.04854, 2018.
    Findings
  • D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • Agustinus Kristiadi, Matthias Hein, and Philipp Hennig. Being bayesian, even just a bit, fixes overconfidence in relu networks. arXiv preprint arXiv:2002.10118, 2020.
    Findings
  • Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
    Google ScholarFindings
  • Frederik Kunstner, Philipp Hennig, and Lukas Balles. Limitations of the empirical Fisher approximation for natural gradient descent. In Advances in Neural Information Processing Systems 32, pages 4156–4167. Curran Associates, Inc., 2019.
    Google ScholarLocate open access versionFindings
  • Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, pages 6402–6413, 2017.
    Google ScholarLocate open access versionFindings
  • Yann LeCun. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
    Findings
  • Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, and Dhruv Batra. Why M heads are better than one: Training a diverse ensemble of deep networks. arXiv preprint arXiv:1511.06314, 2015.
    Findings
  • Shiyu Liang, Yixuan Li, and R. Srikant. Enhancing the reliability of out-of-distribution image detection in neural networks. In 6th International Conference on Learning Representations, ICLR 2018, 2018.
    Google ScholarLocate open access versionFindings
  • Wesley J Maddox, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and Andrew Gordon Wilson. A simple baseline for bayesian uncertainty in deep learning. In Advances in Neural Information Processing Systems, pages 13153–13164, 2019.
    Google ScholarLocate open access versionFindings
  • J Murphy, R Clark, M Collins, C Jackson, M Rodwell, JC Rougier, B Sanderson, D Sexton, and T Yokohata. Perturbed parameter ensembles as a tool for sampling model uncertainties and making climate projections. In Proceedings of ECMWF Workshop on Model Uncertainty, pages 183–208, 2011.
    Google ScholarLocate open access versionFindings
  • Mahdi Pakdaman Naeini, Gregory F. Cooper, and Milos Hauskrecht. Obtaining well calibrated probabilities using Bayesian binning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pages 2901–2907, 2015.
    Google ScholarLocate open access versionFindings
  • William H Press, Saul A Teukolsky, William T Vetterling, and Brian P Flannery. Numerical recipes 3rd edition: The art of scientific computing. Cambridge university press, 2007.
    Google ScholarFindings
  • Joaquin Quinonero-Candela, Carl Edward Rasmussen, Fabian Sinz, Olivier Bousquet, and Bernhard Schölkopf. Evaluating predictive uncertainty challenge. In Machine Learning Challenges Workshop, pages 1–27.
    Google ScholarLocate open access versionFindings
  • Maithra Raghu, Katy Blumer, Rory Sayres, Ziad Obermeyer, Bobby Kleinberg, Sendhil Mullainathan, and Jon Kleinberg. Direct uncertainty prediction for medical second opinions. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 5281–5290, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
    Google ScholarLocate open access versionFindings
  • Hippolyt Ritter, Aleksandar Botev, and David Barber. A scalable laplace approximation for neural networks. In 6th International Conference on Learning Representations, ICLR 2018Conference Track Proceedings, volume 6. International Conference on Representation Learning, 2018.
    Google ScholarLocate open access versionFindings
  • Raanan Yehezkel Rohekar, Yaniv Gurwicz, Shami Nisimov, and Gal Novik. Modeling uncertainty by learning a hierarchy of deep neural connections. In Advances in Neural Information Processing Systems, pages 4246–4256, 2019.
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
    Google ScholarLocate open access versionFindings
  • Levent Sagun, Leon Bottou, and Yann LeCun. Eigenvalues of the hessian in deep learning: Singularity and beyond. arXiv preprint arXiv:1611.07476, 2016.
    Findings
  • Levent Sagun, Utku Evci, V Ugur Guney, Yann Dauphin, and Leon Bottou. Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454, 2017.
    Findings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
    Findings
  • Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. arxiv 2015. arXiv preprint arXiv:1512.00567, 1512, 2015.
    Findings
  • Mattias Teye, Hossein Azizpour, and Kevin Smith. Bayesian uncertainty estimation for batch normalized deep networks. In International Conference on Machine Learning, pages 4914–4923, 2018.
    Google ScholarLocate open access versionFindings
  • Sunil Thulasidasan, Gopinath Chennupati, Jeff A Bilmes, Tanmoy Bhattacharya, and Sarah Michalak. On mixup training: Improved calibration and predictive uncertainty for deep neural networks. In Advances in Neural Information Processing Systems, pages 13888–13899, 2019.
    Google ScholarLocate open access versionFindings
  • Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
    Findings
  • [1] Arakaparampil M Mathai and Serge B Provost. Quadratic forms in random variables: theory and applications. Dekker, 1992.
    Google ScholarFindings
作者
Purang Abolmaesumi
Purang Abolmaesumi
Demian Wassermann
Demian Wassermann
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科