MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

ICLR, 2020.

Cited by: 8|Views142
EI
Weibo:
In this work we propose MAximizing the CErtified Radius, an attack-free and scalable robust training method via directly maximizing the certified radius of a smoothed classifier

Abstract:

Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. In this paper, we propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. Recent work shows that randomized smoothing ca...More
Full Text
Bibtex
Weibo
Introduction
Highlights
  • Modern neural network classifiers are able to achieve very high accuracy on image classification tasks but are sensitive to small, adversarially chosen perturbations to the inputs (Szegedy et al, 2013; Biggio et al, 2013)
  • We propose to learn robust models by directly taking the certified radius into the objective
  • We propose an attack-free and scalable robust training algorithm by MAximizing the CErtified Radius (MACER)
  • In this work we propose MAximizing the CErtified Radius, an attack-free and scalable robust training method via directly maximizing the certified radius of a smoothed classifier
  • According to our extensive experiments, MAximizing the CErtified Radius performs better than previous provable l2-defenses and trains faster
  • Our strong empirical results suggest that adversarial training is not a must for robust training, and defense based on certification is a promising direction for future research
Methods
  • C.2.2 SVHN The results are reported in Table 7.
  • MACER TRAINING FOR 150 EPOCHS.
  • In Table 8 the authors report the performance and training time of MACER on Cifar-10 when it is only run for 150 epochs, and compare with SmoothAdv (Salman et al, 2019) and MACER (440 epochs).
  • 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 ACR Epochs Total hrs.
  • All experiments are run on Cifar-10 with σ = 0.25 or 0.50.
  • Results are reported in Tables 10-13.
  • 16 16 16 λ 12.0 0.0/1.0/2.0/4.0/8.0/16.0 12.0 12.0 γ 8.0 8.0 2.0/4.0/6.0/8.0/10.0/12.0/14.0/16.0 8.0 β 16.0 16.0 16.0 1.0/2.0/4.0/8.0/16.0/32.0/64.0
Results
  • The authors report the results on Cifar-10 and ImageNet in the main body of the paper. Results on MNIST and SVHN can be found in Appendix C.2. σ

    Model 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 ACR

    Certified accuracy Certified accuracy Certified accuracy

    MACER-0.25 Cohen-0.25 Salman-0.25 (a) σ = 0.25 MACER-0.50 Cohen-0.50 Salman-0.50 (b) σ = 0.50 MACER-1.00.
Conclusion
  • CONCLUSION AND FUTURE

    WORK

    In this work the authors propose MACER, an attack-free and scalable robust training method via directly maximizing the certified radius of a smoothed classifier.
  • According to the extensive experiments, MACER performs better than previous provable l2-defenses and trains faster.
  • The authors' strong empirical results suggest that adversarial training is not a must for robust training, and defense based on certification is a promising direction for future research.
  • Several recent papers (Carmon et al, 2019; Zhai et al, 2019; Stanforth et al, 2019) suggest that using unlabeled data helps improve adversarially robust generalization.
Summary
  • Introduction:

    Modern neural network classifiers are able to achieve very high accuracy on image classification tasks but are sensitive to small, adversarially chosen perturbations to the inputs (Szegedy et al, 2013; Biggio et al, 2013).
  • Most of the existing defenses are based on adversarial training (Szegedy et al, 2013; Madry et al, 2017; Goodfellow et al, 2015; Huang et al, 2015; Athalye et al, 2018; Ding et al, 2020)
  • During training, these methods first learn on-the-fly adversarial examples of the inputs with multiple attack iterations and update model parameters using these perturbed samples together with the original labels.
  • Methods:

    C.2.2 SVHN The results are reported in Table 7.
  • MACER TRAINING FOR 150 EPOCHS.
  • In Table 8 the authors report the performance and training time of MACER on Cifar-10 when it is only run for 150 epochs, and compare with SmoothAdv (Salman et al, 2019) and MACER (440 epochs).
  • 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 ACR Epochs Total hrs.
  • All experiments are run on Cifar-10 with σ = 0.25 or 0.50.
  • Results are reported in Tables 10-13.
  • 16 16 16 λ 12.0 0.0/1.0/2.0/4.0/8.0/16.0 12.0 12.0 γ 8.0 8.0 2.0/4.0/6.0/8.0/10.0/12.0/14.0/16.0 8.0 β 16.0 16.0 16.0 1.0/2.0/4.0/8.0/16.0/32.0/64.0
  • Results:

    The authors report the results on Cifar-10 and ImageNet in the main body of the paper. Results on MNIST and SVHN can be found in Appendix C.2. σ

    Model 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 ACR

    Certified accuracy Certified accuracy Certified accuracy

    MACER-0.25 Cohen-0.25 Salman-0.25 (a) σ = 0.25 MACER-0.50 Cohen-0.50 Salman-0.50 (b) σ = 0.50 MACER-1.00.
  • Conclusion:

    CONCLUSION AND FUTURE

    WORK

    In this work the authors propose MACER, an attack-free and scalable robust training method via directly maximizing the certified radius of a smoothed classifier.
  • According to the extensive experiments, MACER performs better than previous provable l2-defenses and trains faster.
  • The authors' strong empirical results suggest that adversarial training is not a must for robust training, and defense based on certification is a promising direction for future research.
  • Several recent papers (Carmon et al, 2019; Zhai et al, 2019; Stanforth et al, 2019) suggest that using unlabeled data helps improve adversarially robust generalization.
Tables
  • Table1: Approximated certified test accuracy and ACR on Cifar-10: Each column is an l2 radius
  • Table2: Approximated certified test accuracy and ACR on ImageNet: Each column is an l2 radius
  • Table3: Training time and performance of σ = 0.25 models
  • Table4: Models for comparison on Cifar-10
  • Table5: Models for comparison on ImageNet
  • Table6: Approximated certified test accuracy and ACR on MNIST: Each column is an l2 radius
  • Table7: Approximated certified test accuracy and ACR on SVHN: Each column is an l2 radius
  • Table8: Table 8
  • Table9: Experimental setting for examining the effect of hyperparameters
  • Table10: Effect of k: Approximated certified test accuracy and ACR on Cifar-10
  • Table11: Effect of λ: Approximated certified test accuracy and ACR on Cifar-10
  • Table12: Effect of γ: Approximated certified test accuracy and ACR on Cifar-10
  • Table13: Effect of β: Approximated certified test accuracy and ACR on Cifar-10. σ β 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 ACR
Download tables as Excel
Related work
  • Neural networks trained by standard SGD are not robust – a small and human imperceptible perturbation can easily change the prediction of a network. In the white-box setting, methods have been proposed to construct adversarial examples with small ∞ or 2 perturbations (Goodfellow et al, 2015; Madry et al, 2017; Carlini & Wagner, 2016; Moosavi-Dezfooli et al, 2015). Furthermore, even in the black-box setting where the adversary does not have access to the model structure and parameters, adversarial examples can be found by either transfer attack (Papernot et al, 2016) or optimization-based approaches (Chen et al, 2017; Rauber et al, 2017; Cheng et al, 2019). It is thus important to study how to improve the robustness of neural networks against adversarial examples.

    Adversarial training So far, adversarial training has been the most successful robust training method according to many recent studies. Adversarial training was first proposed in Szegedy et al (2013) and Goodfellow et al (2015), where they showed that adding adversarial examples to the training set can improve the robustness against such attacks. More recently, Madry et al (2017) formulated adversarial training as a min-max optimization problem and demonstrated that adversarial training with PGD attack leads to empirical robust models. Zhang et al (2019b) further decomposed the robust error as the sum of natural error and boundary error for better performance. Finally, Gao et al (2019) proved the convergence of adversarial training. Although models obtained by adversarial training empirically achieve good performance, they do not have certified error guarantees.
Funding
  • Chen Dan and Pradeep Ravikumar acknowledge the support of Rakuten Inc., and NSF via IIS1909816
  • Huan Zhang and Cho-Jui Hsieh acknowledge the support of NSF via IIS1719097
  • Liwei Wang acknowledges the support of Beijing Academy of Artificial Intelligence
Reference
  • Anish Athalye, Nicholas Carlini, and David A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. CoRR, abs/1802.00420, 2018. URL http://arxiv.org/abs/1802.00420.
    Findings
  • Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. CoRR, abs/1608.04644, 2016. URL http://arxiv.org/abs/1608.04644.
    Findings
  • Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C Duchi. Unlabeled data improves adversarial robustness. arXiv preprint arXiv:1905.13736, 2019.
    Findings
  • Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. ACM, 2017.
    Google ScholarLocate open access versionFindings
  • Minhao Cheng, Thong Le, Pin-Yu Chen, Jinfeng Yi, Huan Zhang, and Cho-Jui Hsieh. Queryefficient hard-label black-box attack: An optimization-based approach. 2019.
    Google ScholarFindings
  • Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 1310–1320, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
    Google ScholarLocate open access versionFindings
  • Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, and Ruitong Huang. Mma training: Direct input space margin maximization through adversarial training. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id= HkeryxBtPB.
    Locate open access versionFindings
  • Ruiqi Gao, Tianle Cai, Haochuan Li, Cho-Jui Hsieh, Liwei Wang, and Jason D Lee. Convergence of adversarial training in overparametrized neural networks. In Advances in Neural Information Processing Systems 32. 2019.
    Google ScholarLocate open access versionFindings
  • Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. URL http://arxiv.org/abs/1412.6572.
    Findings
  • Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018.
    Findings
  • Ruitong Huang, Bing Xu, Dale Schuurmans, and Csaba Szepesvari. Learning with a strong adversary. arXiv preprint arXiv:1511.03034, 2015.
    Findings
  • Harini Kannan, Alexey Kurakin, and Ian J. Goodfellow. Adversarial logit pairing. CoRR, abs/1803.06373, 2018. URL http://arxiv.org/abs/1803.06373.
    Findings
  • Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471, 2018.
    Findings
  • Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Second-order adversarial attack and certifiable robustness. CoRR, abs/1809.03113, 2018. URL http://arxiv.org/abs/1809.03113.
    Findings
  • Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 369–385, 2018.
    Google ScholarLocate open access versionFindings
  • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
    Findings
  • Andreas Maurer and Massimiliano Pontil. Empirical Bernstein Bounds and Sample Variance Penalization. arXiv e-prints, art. arXiv:0907.3740, Jul 2009.
    Findings
  • Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pp. 3575–3583, 2018.
    Google ScholarLocate open access versionFindings
  • Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015. URL http://arxiv.org/abs/1511.04599.
    Findings
  • Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814, 2016.
    Findings
  • Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Alhussein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli, et al. Adversarial robustness through local linearization. arXiv preprint arXiv:1907.02610, 2019.
    Findings
  • Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox: A python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131, 2017. URL http://arxiv.org/abs/1707.04131.
    Findings
  • Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya P. Razenshteyn, and Sebastien Bubeck. Provably robust deep learning via adversarially trained smoothed classifiers. CoRR, abs/1906.04584, 2019. URL http://arxiv.org/abs/1906.04584.
    Findings
  • Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John P. Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! CoRR, abs/1904.12843, 2019. URL http://arxiv.org/abs/1904.12843.
    Findings
  • Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Puschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, pp. 10802–10813, 2018.
    Google ScholarLocate open access versionFindings
  • Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli, et al. Are labels required for improving adversarial robustness? arXiv preprint arXiv:1905.13725, 2019.
    Findings
  • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013. URL http://arxiv.org/abs/1312.6199.
    Findings
  • Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625, 2018.
    Findings
  • Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for ReLU networks. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 5276–5285, Stockholmsmssan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
    Google ScholarLocate open access versionFindings
  • Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 5286–5295, Stockholmsmssan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
    Google ScholarLocate open access versionFindings
  • Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J. Zico Kolter. Scaling provable adversarial defenses. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems 31, pp. 8400–8409. Curran Associates, Inc., 2018.
    Google ScholarLocate open access versionFindings
  • Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991, 2017.
    Findings
  • Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John E. Hopcroft, and Liwei Wang. Adversarially robust generalization just requires more unlabeled data. CoRR, abs/1906.00555, 2019. URL http://arxiv.org/abs/1906.00555.
    Findings
  • Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, and Bin Dong. You only propagate once: Accelerating adversarial training via maximal principle. arXiv preprint arXiv:1905.00877, 2019a.
    Findings
  • Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 7472–7482, Long Beach, California, USA, 09–15 Jun 2019b. PMLR. URL http://proceedings.mlr.press/v97/zhang19p.html.
    Locate open access versionFindings
  • Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, pp. 4939–4948, 2018.
    Google ScholarLocate open access versionFindings
  • Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. arXiv preprint arXiv:1906.06316, 2019c.
    Findings
  • This lemma is the generalized version of Lemma 2 in Salman et al. (2019).
    Google ScholarLocate open access versionFindings
  • Empirical Bernstein Bound Maurer & Pontil (2009) provides us with a tighter bound: Theorem 3. (Theorem 4 in Maurer & Pontil (2009)) Under the conditions of Lemma 2, with probability at least 1 − α, X − EX ≤
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments