AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
In Section 5.1, we experimentally verify the validity of the zero-averaging property of gradients implied by Theorem 1, and discuss its implications on the behaviours of Fast Gradient Sign Method and Projected Gradient Descent method attacks on Bayesian Neural Networks in Section...

Robustness of Bayesian Neural Networks to Gradient-Based Attacks

NIPS 2020, (2020)

引用0|浏览226
下载 PDF 全文
引用
微博一下

摘要

Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, the problem remains open. In this paper, we analyse the geometry of adversarial attacks in the large-data, overparametrized limit for Bayesian N...更多

代码

数据

0
简介
  • Adversarial attacks are small, potentially imperceptible pertubations of test inputs that can lead to catastrophic misclassifications in high-dimensional classifiers such as deep Neural Networks (NN).
  • Many attack strategies are based on identifying directions of high variability in the loss function by evaluating gradients w.r.t. input points
  • Since such variability can be intuitively linked to uncertainty in the prediction, Bayesian Neural Networks (BNNs) [27] have been recently suggested as a more robust deep learning paradigm, a claim that has found some empirical support [15, 16, 3, 22].
  • Neither the source of this robustness, nor its general applicability are well understood mathematically
重点内容
  • Adversarial attacks are small, potentially imperceptible pertubations of test inputs that can lead to catastrophic misclassifications in high-dimensional classifiers such as deep Neural Networks (NN)
  • In Section 5.1, we experimentally verify the validity of the zero-averaging property of gradients implied by Theorem 1, and discuss its implications on the behaviours of Fast Gradient Sign Method (FGSM) and Projected Gradient Descent method (PGD) attacks on Bayesian Neural Networks (BNNs) in Section 5.2
  • Details on the experimental settings and BNN training parameters can be find in the Supplementary Material
  • We investigate the vanishing behavior of input gradients - established by Theorem 1 for the thermodynamic limit regime - in the finite, practical settings, that is with a finite number of training data and with finite-width BNNs
  • We look at an array of more than 1000 different BNN architectures trained with Hamiltonian Monte Carlo (HMC) and Variational Inference (VI) on MNIST and Fashion-MNIST. We experimentally evaluate their accuracy/robustness trade-off on FGSM attacks as compared to that obtained with deterministic NNs trained via Stochastic Gradient Descent (SGD) based methods
  • We believe that the fact that Bayesian ensembles of NNs can evade a broad class of adversarial attacks will be of great relevance
方法
  • BNNs that utilise pointwise uncertainty have been introduced in [21, 15, 30]
  • Most of these approaches have largely relied on Monte Carlo dropout as a posterior inference [11].
  • Bayesian inference combines likelihood and prior via Bayes theorem to obtain a posterior measure on the space of weights p (w|D) ∝ p (D|w) p (w)
结果
  • The authors empirically investigate the theoretical findings on different BNNs. The authors train a variety of BNNs on the MNIST and Fashion MNIST [37] datasets, and evaluate their posterior distributions using HMC and VI approximate inference methods.
  • In Section 5.3 the authors analyse the relationship between robustness and accuracy on thousands of different NN architectures, comparing the results obtained by Bayesian and by deterministic training.
  • Details on the experimental settings and BNN training parameters can be find in the Supplementary Material
结论
  • The quest for robust, data-driven models is an essential component towards the construction of AI-based technologies
  • In this respect, the authors believe that the fact that Bayesian ensembles of NNs can evade a broad class of adversarial attacks will be of great relevance.
  • While in the hands cheaper approximations such as VI enjoyed a degree of adversarial robustness, albeit reduced, there are no guarantees that this will hold in general
  • To this end, the authors hope that this result will spark renewed interest in the pursuit of efficient Bayesian inference algorithms.
  • Evaluating the robustness of BNNs against these attacks would be interesting
表格
  • Table1: Table 1
  • Table2: Hyperparameters for training BNNs using HMC in Figures 2 and 3
  • Table3: Hyperparameters for training BNNs using VI in Figures 2 and 3
  • Table4: Hyperparameters for training BNNs with HMC in Figure 4. * indicates the parameters used in Table 1 of the main text
  • Table5: Hyperparameters for training BNNs with SGD in Figure 4. * indicates the parameters used in Table 1 of the main text
  • Table6: Hyperparameters for training BNNs with SGD in Figure 4
Download tables as Excel
相关工作
  • Related Work The robustess of

    BNNs to adversarial examples has been already observed by Gal and Smith [16], Bekasov and Murray [3]. In particular, in [3] the authors define Bayesian adversarial spheres and empirically show that, for BNNs trained with HMC, adversarial examples tend to have high uncertanity, while in [16] sufficient conditions for idealised BNNs to avoid adversarial examples are derived. However, it is unclear how such conditions could be checked in practice, as it would require one to check that the BNN architecture is invariant under all the symmetries of the data.
引用论文
  • Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
    Findings
  • David Barber. Bayesian reasoning and machine learning. Cambridge University Press, 2012.
    Google ScholarFindings
  • Artur Bekasov and Iain Murray. Bayesian adversarial spheres: Bayesian inference and adversarial examples in a noiseless setting. arXiv preprint arXiv:1811.12335, 2018.
    Findings
  • Battista Biggio and Fabio Roli. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition, 84:317–331, 2018.
    Google ScholarLocate open access versionFindings
  • Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006. ISBN 0387310738.
    Google ScholarFindings
  • Arno Blaas, Luca Laurenti, Andrea Patane, Luca Cardelli, Marta Kwiatkowska, and Stephen Roberts. Robustness quantification for classification with gaussian processes. arXiv preprint arXiv:1905.11876, 2019.
    Findings
  • Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424, 2015.
    Findings
  • Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, Nicola Paoletti, Andrea Patane, and Matthew Wicker. Statistical guarantees for the robustness of bayesian neural networks. arXiv preprint arXiv:1903.01980, 2019.
    Findings
  • Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, and Andrea Patane. Robustness guarantees for bayesian inference with gaussian processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7759–7768, 2019.
    Google ScholarLocate open access versionFindings
  • Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks, 2016.
    Google ScholarFindings
  • Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3–14, 2017.
    Google ScholarLocate open access versionFindings
  • George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
    Google ScholarLocate open access versionFindings
  • Simon S Du, Jason D Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai. Gradient descent finds global minima of deep neural networks. arXiv preprint arXiv:1811.03804, 2018.
    Findings
  • Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier. In Advances in Neural Information Processing Systems, pages 1178–1187, 2018.
    Google ScholarLocate open access versionFindings
  • Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017.
    Findings
  • Yarin Gal and Lewis Smith. Sufficient conditions for idealised models to have no adversarial examples: a theoretical and empirical study with bayesian neural networks. arXiv preprint arXiv:1806.00667, 2018.
    Findings
  • Sebastian Goldt, Marc Mézard, Florent Krzakala, and Lenka Zdeborová. Modelling the influence of data structure on learning in neural networks. arXiv preprint arXiv:1909.11500, 2019.
    Findings
  • Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
    Findings
  • Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. arXiv preprint arXiv:1804.08598, 2018.
    Findings
  • Marc Khoury and Dylan Hadfield-Menell. On the geometry of adversarial examples. CoRR, abs/1811.00525, 2018. URL http://arxiv.org/abs/1811.00525.
    Findings
  • Yingzhen Li and Yarin Gal. Dropout inference in bayesian neural networks with alphadivergences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2052–2061. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Xuanqing Liu, Yao Li, Chongruo Wu, and Cho-Jui Hsieh. Adv-bnn: Improved adversarial defense through robust bayesian neural network. arXiv preprint arXiv:1810.01279, 2018.
    Findings
  • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks, 2017.
    Google ScholarFindings
  • Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33): E7665–E7671, 2018.
    Google ScholarLocate open access versionFindings
  • Rhiannon Michelmore, Matthew Wicker, Luca Laurenti, Luca Cardelli, Yarin Gal, and Marta Kwiatkowska. Uncertainty quantification with statistical guarantees in end-to-end autonomous driving control. arXiv preprint arXiv:1909.09884, 2019.
    Findings
  • Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016.
    Google ScholarLocate open access versionFindings
  • Radford M Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
    Google ScholarFindings
  • Radford M Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2(11):2, 2011.
    Google ScholarLocate open access versionFindings
  • Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pages 506–519, 2017.
    Google ScholarLocate open access versionFindings
  • Ambrish Rawat, Martin Wistuba, and Maria-Irina Nicolae. Adversarial phenomenon in the eyes of bayesian deep learning. arXiv preprint arXiv:1711.08244, 2017.
    Findings
  • Grant M Rotskoff and Eric Vanden-Eijnden. Neural networks as interacting particle systems: Asymptotic convexity of the loss landscape and universal scaling of the approximation error. arXiv preprint arXiv:1805.00915, 2018.
    Findings
  • Alessandro Rozza, Mario Manzo, and Alfredo Petrosino. A novel graph-based fisher kernel method for semi-supervised learning. In Proceedings of the 2014 22nd International Conference on Pattern Recognition, ICPR ’14, page 3786–3791, USA, 2014. IEEE Computer Society. ISBN 9781479952090. doi: 10.1109/ICPR.2014.650. URL https://doi.org/10.1109/ICPR.2014.650.
    Locate open access versionFindings
  • Dong Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, and Yupeng Gao. Is robustness the cost of accuracy?–a comprehensive study on the robustness of 18 deep image classification models. In Proceedings of the European Conference on Computer Vision (ECCV), pages 631–648, 2018.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
    Findings
  • Matthew Wicker, Xiaowei Huang, and Marta Kwiatkowska. Feature-guided black-box safety testing of deep neural networks. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 408–426.
    Google ScholarLocate open access versionFindings
  • Christopher KI Williams and Carl Edward Rasmussen. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
    Google ScholarFindings
  • Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
    Findings
  • Danny Yadron and Dan Tynan. Tesla driver dies in first fatal crash while using autopilot mode. the Guardian, 1, 2016.
    Google ScholarLocate open access versionFindings
  • Nanyang Ye and Zhanxing Zhu. Bayesian adversarial learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 6892–6901. Curran Associates Inc., 2018.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn