In Section 5.1, we experimentally verify the validity of the zero-averaging property of gradients implied by Theorem 1, and discuss its implications on the behaviours of Fast Gradient Sign Method and Projected Gradient Descent method attacks on Bayesian Neural Networks in Section...
Robustness of Bayesian Neural Networks to Gradient-Based Attacks
NIPS 2020, (2020)
下载 PDF 全文
Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, the problem remains open. In this paper, we analyse the geometry of adversarial attacks in the large-data, overparametrized limit for Bayesian N...更多
下载 PDF 全文
- Adversarial attacks are small, potentially imperceptible pertubations of test inputs that can lead to catastrophic misclassifications in high-dimensional classifiers such as deep Neural Networks (NN).
- Many attack strategies are based on identifying directions of high variability in the loss function by evaluating gradients w.r.t. input points
- Since such variability can be intuitively linked to uncertainty in the prediction, Bayesian Neural Networks (BNNs)  have been recently suggested as a more robust deep learning paradigm, a claim that has found some empirical support [15, 16, 3, 22].
- Neither the source of this robustness, nor its general applicability are well understood mathematically
- Adversarial attacks are small, potentially imperceptible pertubations of test inputs that can lead to catastrophic misclassifications in high-dimensional classifiers such as deep Neural Networks (NN)
- In Section 5.1, we experimentally verify the validity of the zero-averaging property of gradients implied by Theorem 1, and discuss its implications on the behaviours of Fast Gradient Sign Method (FGSM) and Projected Gradient Descent method (PGD) attacks on Bayesian Neural Networks (BNNs) in Section 5.2
- Details on the experimental settings and BNN training parameters can be find in the Supplementary Material
- We investigate the vanishing behavior of input gradients - established by Theorem 1 for the thermodynamic limit regime - in the finite, practical settings, that is with a finite number of training data and with finite-width BNNs
- We look at an array of more than 1000 different BNN architectures trained with Hamiltonian Monte Carlo (HMC) and Variational Inference (VI) on MNIST and Fashion-MNIST. We experimentally evaluate their accuracy/robustness trade-off on FGSM attacks as compared to that obtained with deterministic NNs trained via Stochastic Gradient Descent (SGD) based methods
- We believe that the fact that Bayesian ensembles of NNs can evade a broad class of adversarial attacks will be of great relevance
- BNNs that utilise pointwise uncertainty have been introduced in [21, 15, 30]
- Most of these approaches have largely relied on Monte Carlo dropout as a posterior inference .
- Bayesian inference combines likelihood and prior via Bayes theorem to obtain a posterior measure on the space of weights p (w|D) ∝ p (D|w) p (w)
- The authors empirically investigate the theoretical findings on different BNNs. The authors train a variety of BNNs on the MNIST and Fashion MNIST  datasets, and evaluate their posterior distributions using HMC and VI approximate inference methods.
- In Section 5.3 the authors analyse the relationship between robustness and accuracy on thousands of different NN architectures, comparing the results obtained by Bayesian and by deterministic training.
- Details on the experimental settings and BNN training parameters can be find in the Supplementary Material
- The quest for robust, data-driven models is an essential component towards the construction of AI-based technologies
- In this respect, the authors believe that the fact that Bayesian ensembles of NNs can evade a broad class of adversarial attacks will be of great relevance.
- While in the hands cheaper approximations such as VI enjoyed a degree of adversarial robustness, albeit reduced, there are no guarantees that this will hold in general
- To this end, the authors hope that this result will spark renewed interest in the pursuit of efficient Bayesian inference algorithms.
- Evaluating the robustness of BNNs against these attacks would be interesting
- Table1: Table 1
- Table2: Hyperparameters for training BNNs using HMC in Figures 2 and 3
- Table3: Hyperparameters for training BNNs using VI in Figures 2 and 3
- Table4: Hyperparameters for training BNNs with HMC in Figure 4. * indicates the parameters used in Table 1 of the main text
- Table5: Hyperparameters for training BNNs with SGD in Figure 4. * indicates the parameters used in Table 1 of the main text
- Table6: Hyperparameters for training BNNs with SGD in Figure 4
- Related Work The robustess of
BNNs to adversarial examples has been already observed by Gal and Smith , Bekasov and Murray . In particular, in  the authors define Bayesian adversarial spheres and empirically show that, for BNNs trained with HMC, adversarial examples tend to have high uncertanity, while in  sufficient conditions for idealised BNNs to avoid adversarial examples are derived. However, it is unclear how such conditions could be checked in practice, as it would require one to check that the BNN architecture is invariant under all the symmetries of the data.
- Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
- David Barber. Bayesian reasoning and machine learning. Cambridge University Press, 2012.
- Artur Bekasov and Iain Murray. Bayesian adversarial spheres: Bayesian inference and adversarial examples in a noiseless setting. arXiv preprint arXiv:1811.12335, 2018.
- Battista Biggio and Fabio Roli. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition, 84:317–331, 2018.
- Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006. ISBN 0387310738.
- Arno Blaas, Luca Laurenti, Andrea Patane, Luca Cardelli, Marta Kwiatkowska, and Stephen Roberts. Robustness quantification for classification with gaussian processes. arXiv preprint arXiv:1905.11876, 2019.
- Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424, 2015.
- Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, Nicola Paoletti, Andrea Patane, and Matthew Wicker. Statistical guarantees for the robustness of bayesian neural networks. arXiv preprint arXiv:1903.01980, 2019.
- Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, and Andrea Patane. Robustness guarantees for bayesian inference with gaussian processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7759–7768, 2019.
- Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks, 2016.
- Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3–14, 2017.
- George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
- Simon S Du, Jason D Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai. Gradient descent finds global minima of deep neural networks. arXiv preprint arXiv:1811.03804, 2018.
- Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier. In Advances in Neural Information Processing Systems, pages 1178–1187, 2018.
- Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017.
- Yarin Gal and Lewis Smith. Sufficient conditions for idealised models to have no adversarial examples: a theoretical and empirical study with bayesian neural networks. arXiv preprint arXiv:1806.00667, 2018.
- Sebastian Goldt, Marc Mézard, Florent Krzakala, and Lenka Zdeborová. Modelling the influence of data structure on learning in neural networks. arXiv preprint arXiv:1909.11500, 2019.
- Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. arXiv preprint arXiv:1804.08598, 2018.
- Marc Khoury and Dylan Hadfield-Menell. On the geometry of adversarial examples. CoRR, abs/1811.00525, 2018. URL http://arxiv.org/abs/1811.00525.
- Yingzhen Li and Yarin Gal. Dropout inference in bayesian neural networks with alphadivergences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2052–2061. JMLR. org, 2017.
- Xuanqing Liu, Yao Li, Chongruo Wu, and Cho-Jui Hsieh. Adv-bnn: Improved adversarial defense through robust bayesian neural network. arXiv preprint arXiv:1810.01279, 2018.
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks, 2017.
- Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33): E7665–E7671, 2018.
- Rhiannon Michelmore, Matthew Wicker, Luca Laurenti, Luca Cardelli, Yarin Gal, and Marta Kwiatkowska. Uncertainty quantification with statistical guarantees in end-to-end autonomous driving control. arXiv preprint arXiv:1909.09884, 2019.
- Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016.
- Radford M Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
- Radford M Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2(11):2, 2011.
- Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pages 506–519, 2017.
- Ambrish Rawat, Martin Wistuba, and Maria-Irina Nicolae. Adversarial phenomenon in the eyes of bayesian deep learning. arXiv preprint arXiv:1711.08244, 2017.
- Grant M Rotskoff and Eric Vanden-Eijnden. Neural networks as interacting particle systems: Asymptotic convexity of the loss landscape and universal scaling of the approximation error. arXiv preprint arXiv:1805.00915, 2018.
- Alessandro Rozza, Mario Manzo, and Alfredo Petrosino. A novel graph-based fisher kernel method for semi-supervised learning. In Proceedings of the 2014 22nd International Conference on Pattern Recognition, ICPR ’14, page 3786–3791, USA, 2014. IEEE Computer Society. ISBN 9781479952090. doi: 10.1109/ICPR.2014.650. URL https://doi.org/10.1109/ICPR.2014.650.
- Dong Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, and Yupeng Gao. Is robustness the cost of accuracy?–a comprehensive study on the robustness of 18 deep image classification models. In Proceedings of the European Conference on Computer Vision (ECCV), pages 631–648, 2018.
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Matthew Wicker, Xiaowei Huang, and Marta Kwiatkowska. Feature-guided black-box safety testing of deep neural networks. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 408–426.
- Christopher KI Williams and Carl Edward Rasmussen. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
- Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Danny Yadron and Dan Tynan. Tesla driver dies in first fatal crash while using autopilot mode. the Guardian, 1, 2016.
- Nanyang Ye and Zhanxing Zhu. Bayesian adversarial learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 6892–6901. Curran Associates Inc., 2018.