AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We developed the Max-Margin Adversarial training algorithm that optimizes the margins via adversarial training with perturbation magnitude adapted both throughout training and individually for the distinct datapoints in the training dataset

MMA Training: Direct Input Space Margin Maximization through Adversarial Training

ICLR, (2020)

Cited by: 2|Views96
EI
Full Text
Bibtex
Weibo

Abstract

We study adversarial robustness of neural networks from a margin maximization perspective, where margins are defined as the distances from inputs to a classifier's decision boundary. Our study shows that maximizing margins can be achieved by minimizing the adversarial loss on the decision boundary at the "shortest successful perturb...More
0
Introduction
  • Despite their impressive performance on various learning tasks, neural networks have been shown to be vulnerable to adversarial perturbations (Szegedy et al, 2013; Biggio et al, 2013).
  • Figure 1 shows the natural connection between adversarial robustness and the margins of the data points, where the margin is defined as the distance from a data point to the classifier’s decision boundary.
  • The margin of a data point is the minimum distance that x has to be perturbed to change the classifier’s prediction.
  • The larger the margin is, the farther the distance from the input to the decision boundary is, the more robust the classifier is w.r.t. this input
Highlights
  • Despite their impressive performance on various learning tasks, neural networks have been shown to be vulnerable to adversarial perturbations (Szegedy et al, 2013; Biggio et al, 2013)
  • Theorem 2.1 summarizes the theoretical results, where we show separately later 1) how to calculate the gradient of the margin under some smoothness assumptions; 2) without smoothness, margin maximization can still be achieved by minimizing the loss at the shortest successful perturbation
  • Through our development of Max-Margin Adversarial training in the last section, we have shown that margin maximization is closely related to adversarial training with the optimal perturbation length δ∗
  • Our results confirm our theory and show that Max-Margin Adversarial training is stable to its hyperparameter dmax, and balances better among various attack lengths compared to adversarial training with fixed perturbation magnitude
  • We developed the Max-Margin Adversarial training algorithm that optimizes the margins via adversarial training with perturbation magnitude adapted both throughout training and individually for the distinct datapoints in the training dataset
  • Our experiments on CIFAR10 and MNIST empirically confirmed our theory and demonstrate that Max-Margin Adversarial training outperforms adversarial training in terms of sensitivity to hyperparameter setting and robustness to variable attack lengths, suggesting Max-Margin Adversarial is a better choice for defense when the adversary is unknown, which is often the case in practice
Methods
  • The authors empirically examine several hypotheses and compare MMA training with different adversarial training algorithms on the MNIST and CIFAR10 datasets under ∞/ 2-norm constrained perturbations.
  • The authors' results confirm the theory and show that MMA training is stable to its hyperparameter dmax, and balances better among various attack lengths compared to adversarial training with fixed perturbation magnitude.
  • This suggests that MMA training is a better choice for defense when the perturbation length is unknown, which is often the case in practice
Results
  • F FULL RESULTS AND TABLES

    The authors present all the empirical results in Table 4 to 15. the authors show model performances under combined (whitebox+transfer) attacks in Tables 4 to 7.
  • F FULL RESULTS AND TABLES.
  • The authors show model performances under combined attacks in Tables 4 to 7.
  • This is the proxy for true robustness measure.
  • The authors show model performances under only whitebox PGD attacks in Tables 8 to 11.
  • DDN-Rony et al models are downloaded from https://github.
  • TRADES models are downloaded from https: //github.com/yaodongyu/TRADES
Conclusion
  • The authors proposed to directly maximize the margins to improve adversarial robustness.
  • The authors developed the MMA training algorithm that optimizes the margins via adversarial training with perturbation magnitude adapted both throughout training and individually for the distinct datapoints in the training dataset.
  • The authors rigorously analyzed the relation between adversarial training and margin maximization.
  • The authors' experiments on CIFAR10 and MNIST empirically confirmed the theory and demonstrate that MMA training outperforms adversarial training in terms of sensitivity to hyperparameter setting and robustness to variable attack lengths, suggesting MMA is a better choice for defense when the adversary is unknown, which is often the case in practice
Tables
  • Table1: Accuracies of representative models trained on CIFAR10 with ∞-norm constrained attacks. Robust accuracies are calculated under combined (whitebox+transfer) PGD attacks. AvgAcc averages over clean and all robust accuracies; AvgRobAcc averages over all robust accuracies
  • Table2: CW- 2 attack results on models trained on MNIST with 2-norm constrained attacks
  • Table3: CW- 2 attack results on models trained on CIFAR10 with 2-norm constrained attacks
  • Table4: Accuracies of models trained on MNIST with ∞-norm constrained attacks. These robust accuracies are calculated under both combined (whitebox+transfer) PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table5: Accuracies of models trained on CIFAR10 with ∞-norm constrained attacks. These robust accuracies are calculated under both combined (whitebox+transfer) PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table6: Accuracies of models trained on MNIST with 2-norm constrained attacks. These robust accuracies are calculated under both combined (whitebox+transfer) PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table7: Accuracies of models trained on CIFAR10 with 2-norm constrained attacks. These robust accuracies are calculated under both combined (whitebox+transfer) PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table8: Accuracies of models trained on MNIST with ∞-norm constrained attacks. These robust accuracies are calculated under only whitebox PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table9: Accuracies of models trained on CIFAR10 with ∞-norm constrained attacks. These robust accuracies are calculated under only whitebox PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table10: Accuracies of models trained on MNIST with 2-norm constrained attacks. These robust accuracies are calculated under only whitebox PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table11: Accuracies of models trained on CIFAR10 with 2-norm constrained attacks. These robust accuracies are calculated under only whitebox PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table12: The TransferGap of models trained on MNIST with ∞-norm constrained attacks. TransferGap indicates the gap between robust accuracy under only whitebox PGD attacks and under combined (whitebox+transfer) PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table13: The TransferGap of models trained on CIFAR10 with ∞-norm constrained attacks. TransferGap indicates the gap between robust accuracy under only whitebox PGD attacks and under combined (whitebox+transfer) PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table14: The TransferGap of models trained on MNIST with 2-norm constrained attacks. TransferGap indicates the gap between robust accuracy under only whitebox PGD attacks and under combined (whitebox+transfer) PGD attacks. sd0 and sd1 indicate 2 different random seeds
  • Table15: The TransferGap of models trained on CIFAR10 with 2-norm constrained attacks. TransferGap indicates the gap between robust accuracy under only whitebox PGD attacks and under combined (whitebox+transfer) PGD attacks. sd0 and sd1 indicate 2 different random seeds
Download tables as Excel
Funding
  • Studies adversarial robustness of neural networks from a margin maximization perspective, where margins are defined as the distances from inputs to a classifier’s decision boundary
  • Proposes Max-Margin Adversarial training to directly maximize the margins to achieve adversarial robustness
  • Adversarial training with fixed perturbation length is maximizing a lower bound of the margin, if is smaller than the margin of that training point
  • Focuses our theoretical efforts on the formulation for directly maximizing the input space margin, and understanding the standard adversarial training method from a margin maximization perspective
  • Focuses our empirical efforts on thoroughly examining our MMA training algorithm, comparing with adversarial training with a fixed perturbation magnitude
Reference
  • Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning, pp. 274–283, 2018. 4.3
    Google ScholarLocate open access versionFindings
  • Springer, 2013. 1
    Google ScholarFindings
  • Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pp. 39–57. IEEE, 2017. D, F.1
    Google ScholarLocate open access versionFindings
  • Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C Duchi. Unlabeled data improves adversarial robustness. arXiv preprint arXiv:1905.13736, 2019. 1
    Findings
  • Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In International Conference on Machine Learning, pp. 854–863, 2017. 1.1
    Google ScholarLocate open access versionFindings
  • Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918, 2019. B
    Findings
  • Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu networks via maximization of linear regions. arXiv preprint arXiv:1810.07481, 2018. 1.1
    Findings
  • Gavin Weiguang Ding, Kry Yik-Chau Lui, Xiaomeng Jin, Luyu Wang, and Ruitong Huang. On the sensitivity of adversarial robustness to input data distributions. In International Conference on Learning Representations, 2019a. F
    Google ScholarLocate open access versionFindings
  • Gavin Weiguang Ding, Luyu Wang, and Xiaomeng Jin. AdverTorch v0.1: An adversarial robustness toolbox based on pytorch. arXiv preprint arXiv:1902.07623, 2019b. D
    Findings
  • Gamaleldin F Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, and Samy Bengio. Large margin deep networks for classification. arXiv preprint arXiv:1803.05598, 2018. 1.1, B
    Findings
  • Chuan Guo, Jared S Frank, and Kilian Q Weinberger. Low frequency adversarial perturbation. arXiv preprint arXiv:1809.08758, 2018. 1
    Findings
  • Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. arXiv preprint arXiv:1705.08475, 2017. 1.1
    Findings
  • Dan Hendrycks, Kimin Lee, and Mantas Mazeika. Using pre-training can improve model robustness and uncertainty. arXiv preprint arXiv:1901.09960, 2019. 1
    Findings
  • Ruitong Huang, Bing Xu, Dale Schuurmans, and Csaba Szepesvári. Learning with a strong adversary. arXiv preprint arXiv:1511.03034, 2015. 1, 3
    Findings
  • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017. 1, 2, 2.3, 2.3, 2.4, 3, 3, 4, 4.1, C
    Findings
  • Alexander Matyasko and Lap-Pui Chau. Margin maximization for robust classification using deep learning. In Neural Networks (IJCNN), 2017 International Joint Conference on, pp. 300–307. IEEE, 2017. 1.1, B
    Google ScholarLocate open access versionFindings
  • Jérôme Rony, Luiz G Hafemann, Luis S Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric Granger. Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses. arXiv preprint arXiv:1811.09600, 2018. 2.3, 4.1, B.1, C
    Findings
  • Andrew Slavin Ross and Finale Doshi-Velez. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. arXiv preprint arXiv:1711.09404, 2017. 1.1
    Findings
  • Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! arXiv preprint arXiv:1904.12843, 201
    Findings
  • Yash Sharma, Gavin Weiguang Ding, and Marcus A Brubaker. On the effectiveness of low frequency perturbations. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 3389–3396. AAAI Press, 2019. 1
    Google ScholarLocate open access versionFindings
  • Jure Sokolic, Raja Giryes, Guillermo Sapiro, and Miguel RD Rodrigues. Robust large margin deep neural networks. IEEE Transactions on Signal Processing, 2017. 1.1, B
    Google ScholarLocate open access versionFindings
  • Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli, et al. Are labels required for improving adversarial robustness? arXiv preprint arXiv:1905.13725, 2019. 1
    Findings
  • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013. 1
    Findings
  • Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama. Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. In Advances in Neural Information Processing Systems, pp. 6542–6551, 2018. 1.1, B
    Google ScholarLocate open access versionFindings
  • Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk and the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666, 2018. 4.3, D
    Findings
  • Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 2013. B
    Google ScholarFindings
  • Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, and Quanquan Gu. On the convergence and robustness of adversarial training. In International Conference on Machine Learning, pp. 6586–6595, 2019. B
    Google ScholarFindings
  • Ziang Yan, Yiwen Guo, and Changshui Zhang. Adversarial margin maximization networks. IEEE transactions on pattern analysis and machine intelligence, 2019. 1.1, B
    Google ScholarFindings
  • Nanyang Ye and Zhanxing Zhu. Bayesian adversarial learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6892–6901. Curran Associates Inc., 2018. B
    Google ScholarLocate open access versionFindings
  • Yao-Liang Yu. The differentiability of the upper envelop. Technical note, 2012. A.2
    Google ScholarLocate open access versionFindings
  • Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016. C
    Findings
  • Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, and Liwei Wang. Macer: Attack-free and scalable robust training via maximizing certified radius. In International Conference on Learning Representations, 2020. B
    Google ScholarLocate open access versionFindings
  • Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, and Bin Dong. You only propagate once: Accelerating adversarial training via maximal principle. arXiv preprint arXiv:1905.00877, 2019a. 1
    Findings
  • Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I Jordan. Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573, 2019b. 1, 4.3
    Findings
  • Huan Zhang, Hongge Chen, Zhao Song, Duane Boning, Inderjit S Dhillon, and Cho-Jui Hsieh. The limitations of adversarial training and the blind-spot attack. arXiv preprint arXiv:1901.04684, 2019c. F
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科