Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks

IJCAI 2020, pp. 2140-2147, 2020.

Cited by: 0|Bibtex|Views47|DOI:https://doi.org/10.24963/ijcai.2020/296
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
We introduce DENN, an ensemble approach building a set of Diversely Extrapolated Neural Networks that fits the training data and is able to generalize more diversely when extrapolating to novel data points

Abstract:

By virtue of their expressive power, neural networks (NNs) are well suited to fitting large, complex datasets, yet they are also known to produce similar predictions for points outside the training distribution. As such, they are, like humans, under the influence of the Black Swan theory: models tend to be extremely "surprised" by rare...More

Code:

Data:

0
Introduction
  • “Black swans” are rare, surprising events that cannot be predicted by humans or statistical models.
  • They can have huge repercussions, and the authors typically update the models to justify a posteriori the existence of such events [Taleb, 2007].
  • While the statistical models were subsequently updated to take into account new data from the crisis, being overconfident, they would by definition still be surprised when a new black swan appears in the future (Fig. 1).
  • It should be highly uncertain for completely novel inputs that do not display the same patterns as the training set
Highlights
  • “Black swans” are rare, surprising events that cannot be predicted by humans or statistical models
  • While the statistical models were subsequently updated to take into account new data from the crisis, being overconfident, they would by definition still be surprised when a new black swan appears in the future (Fig. 1)
  • We described a method for training convolutional and regular neural networks more diverse OOD by using a modified loss function enacting directly in the function space
  • We explored various methods to sample the repulsive locations used in the proposed loss function, and discussed how choosing judiciously the repulsive locations can modify the learnt representation to be more uncertain when confronted with “surprising” data points, offering a solution to handle black swan events in deep learning
  • We studied how DENN can detect outliers more efficiently than a usual ensemble, requiring so fewer demonstrations
  • Working in the latent space using a variational autoencoder [Kingma and Welling, 2014] could help sampling repulsive locations independent of the input space nature, as recent works have focused on the repulsive datasets themselves [Abbasi et al, 2019; Sensoy et al, 2020]
Methods
  • The authors apply the proposed loss function to different tasks and compare the results1 with existing approaches, to assess if using DENN can lead to the desired high uncertainty OOD.
  • The authors first compare the performance of DENN with other approaches on a simple regression task to study visually the advantages brought by the diversity constraint to the posterior predictive distribution.
  • The authors illustrate how DENN can seamlessly be applied to classification, enabling the training of an ensemble having diverse predictions for unexpected datasets.
  • The authors generate the repulsive locations by adding Gaussian noise to the training points (Sec. 3.4)
Results
  • The authors first generate the demonstrations with a red sphere target as D, as well as the repulsive frames, stored in X, with a green sphere target.
  • Avg. std over actions Avg. std over actions Avg. std over actions with cross-validation on a yellow sphere targets dataset.
  • The authors evaluate both methods on red sphere targets for generalization and blue sphere targets for outlier detection.
  • While the deep ensemble is more confident than DENN on the generalization dataset (Fig. 6, left plot), it is more confident on the OOD dataset and cannot detect outliers as well.
  • Interpreting the target color change as a black swan event illustrates the failure of regular ensembles to anticipate unlikely events
Conclusion
  • The authors described a method for training convolutional and regular NNs more diverse OOD by using a modified loss function enacting directly in the function space.
  • The method introduces hyperparameters that necessitate tuning, and three distinct datasets: for model training, for producing the repulsive locations, and for hyperparameter selection.
  • This can be restrictive when the problem offers limited sources of data.
  • Working in the latent space using a variational autoencoder [Kingma and Welling, 2014] could help sampling repulsive locations independent of the input space nature, as recent works have focused on the repulsive datasets themselves [Abbasi et al, 2019; Sensoy et al, 2020]
Summary
  • Introduction:

    “Black swans” are rare, surprising events that cannot be predicted by humans or statistical models.
  • They can have huge repercussions, and the authors typically update the models to justify a posteriori the existence of such events [Taleb, 2007].
  • While the statistical models were subsequently updated to take into account new data from the crisis, being overconfident, they would by definition still be surprised when a new black swan appears in the future (Fig. 1).
  • It should be highly uncertain for completely novel inputs that do not display the same patterns as the training set
  • Methods:

    The authors apply the proposed loss function to different tasks and compare the results1 with existing approaches, to assess if using DENN can lead to the desired high uncertainty OOD.
  • The authors first compare the performance of DENN with other approaches on a simple regression task to study visually the advantages brought by the diversity constraint to the posterior predictive distribution.
  • The authors illustrate how DENN can seamlessly be applied to classification, enabling the training of an ensemble having diverse predictions for unexpected datasets.
  • The authors generate the repulsive locations by adding Gaussian noise to the training points (Sec. 3.4)
  • Results:

    The authors first generate the demonstrations with a red sphere target as D, as well as the repulsive frames, stored in X, with a green sphere target.
  • Avg. std over actions Avg. std over actions Avg. std over actions with cross-validation on a yellow sphere targets dataset.
  • The authors evaluate both methods on red sphere targets for generalization and blue sphere targets for outlier detection.
  • While the deep ensemble is more confident than DENN on the generalization dataset (Fig. 6, left plot), it is more confident on the OOD dataset and cannot detect outliers as well.
  • Interpreting the target color change as a black swan event illustrates the failure of regular ensembles to anticipate unlikely events
  • Conclusion:

    The authors described a method for training convolutional and regular NNs more diverse OOD by using a modified loss function enacting directly in the function space.
  • The method introduces hyperparameters that necessitate tuning, and three distinct datasets: for model training, for producing the repulsive locations, and for hyperparameter selection.
  • This can be restrictive when the problem offers limited sources of data.
  • Working in the latent space using a variational autoencoder [Kingma and Welling, 2014] could help sampling repulsive locations independent of the input space nature, as recent works have focused on the repulsive datasets themselves [Abbasi et al, 2019; Sensoy et al, 2020]
Related work
  • Ensemble methods are a major family of models used to estimate predictive uncertainty [Lakshminarayanan et al, 2017; Pearce et al, 2018; Lee and Chung, 2020; Tran et al, 2020]. Training the same NN architecture with different initialization conditions, over the same training data, leads to different solutions. The predictions of the ensemble are aggregated to estimate its confidence, with higher uncertainty for unseen data [Lakshminarayanan et al, 2017]. Additionally constraining their weights to stay close to their initial value increases the NNs diversity, forming an “anchored ensemble” [Pearce et al, 2018]. This maintains the diversity induced by the initial weights, which otherwise tends to disappear during learning. Pearce et al coin the term of quasiprior to denote the predictor corresponding to an untrained NN. Both methods assume that the initial weights diversity is sufficient to obtain diverse predictors. However, it is neither clear how to increase or control this weights diversity, nor how it translates to the function space.
Funding
  • This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canada CIFAR AI chairs program
Reference
  • [Abbasi et al., 2019] Mahdieh Abbasi, Changjian Shui, Arezoo Rajabi, Christian Gagne, and Rakesh Bobba. Toward metrics for differentiating out-of-distribution sets. In Advances in Neural Information Processing Systems Workshop on Safety and Robustness in Decision Making, 2019.
    Google ScholarLocate open access versionFindings
  • [Borgwardt et al., 2006] Karsten M Borgwardt, Arthur Gretton, Malte J Rasch, Hans-Peter Kriegel, Bernhard Scholkopf, and Alex J Smola. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 22(14):e49–e57, 2006.
    Google ScholarLocate open access versionFindings
  • [Cai et al., 2006] Hongmin Cai, Xiaoyin Xu, Ju Lu, Jeff W Lichtman, SP Yung, and Stephen TC Wong. Repulsive force based snake model to segment and track neuronal axons in 3d microscopy image stacks. NeuroImage, 32(4):1608–1620, 2006.
    Google ScholarLocate open access versionFindings
  • [Cohen et al., 2017] Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andrevan Schaik. EMNIST: an extension of MNIST to handwritten letters. CoRR, abs/1702.05373, 2017.
    Findings
  • [Cybenko, 1989] George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
    Google ScholarLocate open access versionFindings
  • [Efron, 1982] Bradley Efron. The jackknife, the bootstrap, and other resampling plans. Siam, 1982.
    Google ScholarLocate open access versionFindings
  • [Frankle and Carbin, 2019] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Conference Track Proceedings, 2019.
    Google ScholarLocate open access versionFindings
  • [Gal and Ghahramani, 2016] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning, pages 1050–1059, 2016.
    Google ScholarLocate open access versionFindings
  • [Gal, 2016] Yarin Gal. Uncertainty in deep learning. PhD thesis, University of Cambridge, 2016.
    Google ScholarFindings
  • [Hafner et al., 2019] Danijar Hafner, Dustin Tran, Timothy P. Lillicrap, Alex Irpan, and James Davidson. Noise contrastive priors for functional uncertainty. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22-25, 2019, 2019.
    Google ScholarLocate open access versionFindings
  • [Han et al., 2015] Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135–1143, 2015.
    Google ScholarLocate open access versionFindings
  • [Hendrycks et al., 2019] Dan Hendrycks, Mantas Mazeika, and Thomas G Dietterich. Deep anomaly detection with outlier exposure. In International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Conference Track Proceedings, 2019.
    Google ScholarLocate open access versionFindings
  • [Hong et al., 2018] Zhang-Wei Hong, Tzu-Yun Shann, ShihYang Su, Yi-Hsiang Chang, Tsu-Jui Fu, and Chun-Yi Lee. Diversity-driven exploration strategy for deep reinforcement learning. In Advances in Neural Information Processing Systems, pages 10510–10521, 2018.
    Google ScholarLocate open access versionFindings
  • [Jaynes, 1957] Edwin T Jaynes. Information theory and statistical mechanics. Physical review, 1957.
    Google ScholarLocate open access versionFindings
  • [Kim and Pineau, 2013] Beomjoon Kim and Joelle Pineau. Maximum mean discrepancy imitation learning. In Robotics: Science and systems, 2013.
    Google ScholarFindings
  • [Kingma and Welling, 2014] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
    Google ScholarLocate open access versionFindings
  • [Lakshminarayanan et al., 2017] Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, pages 6402–6413, 2017.
    Google ScholarLocate open access versionFindings
  • [Lee and Chung, 2020] Jisoo Lee and Sae-Young Chung. Robust training with ensemble consensus. In International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26- May 1, 2020, Conference Track Proceedings, 2020.
    Google ScholarLocate open access versionFindings
  • [Lee et al., 2018] Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. Training confidence-calibrated classifiers for detecting out-of-distribution samples. In International
    Google ScholarLocate open access versionFindings
  • Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30- May 3, 2018, Conference Track Proceedings, 2018.
    Google ScholarLocate open access versionFindings
  • [Li et al., 2018] Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes. In International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30- May 3, 2018, Conference Track Proceedings, 2018.
    Google ScholarLocate open access versionFindings
  • [Malinin and Gales, 2018] Andrey Malinin and Mark Gales. Predictive uncertainty estimation via prior networks. In Advances in Neural Information Processing Systems, pages 7047–7058, 2018.
    Google ScholarLocate open access versionFindings
  • [Osband et al., 2016] Ian Osband, Charles Blundell, Alexander Pritzel, and Benjamin Van Roy. Deep exploration via bootstrapped dqn. In Advances in Neural Information Processing Systems, pages 4026–4034, 2016.
    Google ScholarLocate open access versionFindings
  • [Osband et al., 2018] Ian Osband, John Aslanides, and Albin Cassirer. Randomized prior functions for deep reinforcement learning. In Advances in Neural Information Processing Systems, pages 8626–8638, 2018.
    Google ScholarLocate open access versionFindings
  • [Pearce et al., 2018] Tim Pearce, Nicolas Anastassacos, Mohamed Zaki, and Andy Neely. Bayesian inference with anchored ensembles of neural networks, and application to reinforcement learning. In International Conference on Machine Learning Workshop on Exploration in Reinforcement Learning, 2018.
    Google ScholarLocate open access versionFindings
  • [Ross et al., 2011] Stephane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In International Conference on Artificial Intelligence and Statistics, pages 627–635, 2011.
    Google ScholarLocate open access versionFindings
  • [Schulman et al., 2017] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
    Findings
  • [Sensoy et al., 2020] Murat Sensoy, Lance Kaplan, Federico Cerutti, and Maryam Saleki. Uncertainty-aware deep classifiers using generative models. In Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020.
    Google ScholarLocate open access versionFindings
  • [Taleb, 2007] Nassim Nicholas Taleb. The black swan: The impact of the highly improbable, volume 2. Random house, 2007.
    Google ScholarFindings
  • [Todorov et al., 2012] Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • [Tran et al., 2020] Linh Tran, Bastiaan S. Veeling, Kevin Roth, Jakub Swiatkowski, Joshua V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Sebastian Nowozin, and Rodolphe Jenatton. Hydra: Preserving ensemble diversity for model distillation. arXiv:2001.04694, 2020.
    Findings
  • [Vadakkepat et al., 2000] Prahlad Vadakkepat, Kay Chen Tan, and Wang Ming-Liang. Evolutionary artificial potential fields and their application in real time robot path planning. In Congress on evolutionary computation. CEC00 (Cat. No. 00TH8512), volume 1, pages 256–263. IEEE, 2000.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments