Diversity can be Transferred: Output Diversification for White- and Black-box Attacks

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views35
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
Output Diversified Sampling for black-box attacks is applicable even if surrogate models are trained with out-of-distribution dataset from target images, so black-box attacks with ODS are more practical than other black-box attacks using prior knowledge of surrogate models

Abstract:

Adversarial attacks often involve random perturbations of the inputs drawn from uniform or Gaussian distributions, e.g. to initialize optimization-based white-box attacks or generate update directions in black-box attacks. These simple perturbations, however, could be suboptimal as they are agnostic to the model being attacked. To improve...More
0
Introduction
  • Deep neural networks have achieved great success in image classification. it is known that they are vulnerable to adversarial examples [1] — small perturbations imperceptible to humans that cause classifiers to output wrong predictions.
  • Some black-box attack methods [17, 18] use random sampling to explore update directions for finding or improving adversarial examples.
  • In these attacks, random perturbations are typically sampled from a naïve uniform or Gaussian distribution in the input pixel space
Highlights
  • Deep neural networks have achieved great success in image classification
  • Attack performances with ODS for initialization (ODI) are better than naïve initialization for all models and attacks
  • Our tuned ODI-Projected Gradient Descent (PGD) reduces the accuracy to 88.12% for the MNIST model, and to 44.00% for the CIFAR-10 model
  • Simple Black-box Attack (SimBA)-Output Diversified Sampling (ODS) remarkably reduces the average queries by a factor ranging between 2 and 3 compared to SimBA-discrete cosine transform (DCT) in both untargeted and targeted settings
  • ODS for black-box attacks is applicable even if surrogate models are trained with out-of-distribution dataset from target images, so black-box attacks with ODS are more practical than other black-box attacks using prior knowledge of surrogate models
  • We demonstrate that a simple combination of PGD and ODI achieves new state-of-the-art attack success rates
  • While we only focus on ODS with surrogate models trained with labeled dataset, we believe that ODS works well even if we only have unlabeled dataset
Results
  • The authors summarize all quantitative results in Table 1.
  • Setup One state-of-the-art attack the authors compare with is the well-tuned PGD attack [16], which achieved 88.21% accuracy for the robust MNIST model.
  • The authors' tuned ODI-PGD reduces the accuracy to 88.12% for the MNIST model, and to 44.00% for the CIFAR-10 model
  • These results outperform existing state-of-the-art attacks.
  • In Table 2, the computational cost of tuned ODI-PGD is smaller than that of state-of-the-art attacks, and especially 50 times smaller on the CIFAR-10 model.
  • These results indicate that ODI-PGD might be a better benchmark for comparing and evaluating different defense methods, rather than naïve-PGD and PGD1.
Conclusion
  • The authors propose ODS, a novel sampling strategy for white- and black-box attacks.
  • By generating more diverse perturbations as measured in the output space, ODS brings more effective starting points for white-box attacks.
  • Leveraging surrogate models, ODS improves the exploration of the output space for black-box attacks.
  • While the authors only focus on ODS with surrogate models trained with labeled dataset, the authors believe that ODS works well even if the authors only have unlabeled dataset.
Summary
  • Introduction:

    Deep neural networks have achieved great success in image classification. it is known that they are vulnerable to adversarial examples [1] — small perturbations imperceptible to humans that cause classifiers to output wrong predictions.
  • Some black-box attack methods [17, 18] use random sampling to explore update directions for finding or improving adversarial examples.
  • In these attacks, random perturbations are typically sampled from a naïve uniform or Gaussian distribution in the input pixel space
  • Results:

    The authors summarize all quantitative results in Table 1.
  • Setup One state-of-the-art attack the authors compare with is the well-tuned PGD attack [16], which achieved 88.21% accuracy for the robust MNIST model.
  • The authors' tuned ODI-PGD reduces the accuracy to 88.12% for the MNIST model, and to 44.00% for the CIFAR-10 model
  • These results outperform existing state-of-the-art attacks.
  • In Table 2, the computational cost of tuned ODI-PGD is smaller than that of state-of-the-art attacks, and especially 50 times smaller on the CIFAR-10 model.
  • These results indicate that ODI-PGD might be a better benchmark for comparing and evaluating different defense methods, rather than naïve-PGD and PGD1.
  • Conclusion:

    The authors propose ODS, a novel sampling strategy for white- and black-box attacks.
  • By generating more diverse perturbations as measured in the output space, ODS brings more effective starting points for white-box attacks.
  • Leveraging surrogate models, ODS improves the exploration of the output space for black-box attacks.
  • While the authors only focus on ODS with surrogate models trained with labeled dataset, the authors believe that ODS works well even if the authors only have unlabeled dataset.
Tables
  • Table1: Model performance under attacks with ODI. The values are model accuracy (lower is better) for PGD and the average of the minimum 2 perturbations (lower is better) for C&W. All results are the average of three trials
  • Table2: Comparison of ODI-PGD with state-of-the-art attacks against pre-trained defense models. The complexity rows display products of the number of steps and restarts. Results for ODI-PGD are the average of three trials. For ODI-PGD, the number of steps is the sum of ODS and PGD steps
  • Table3: Number of queries and 2 perturbations for score-based attacks
  • Table4: Number of queries for SimBA-ODS and 2 score-based state-of-the-art attacks with norm bound
  • Table5: Median 2 perturbations for Boundary-ODS and decision-based state-of-the-art attacks
  • Table6: Median 2 perturbations for Boundary-ODS with surrogate models trained with OOD images
  • Table7: Hyperparameter setting for tuned ODI-PGD in Section 4.2
  • Table8: The sensitivity to the number of ODI steps NODI and step size ηODI . We repeat each experiment 5 times to calculate statistics
  • Table9: Accuracy of models after performing ODI-PGD and naïve-PGD attacks against recently proposed defense models
  • Table10: Query counts and 2 perturbations for score-based Simple Black-box Attacks (SimBA) against pre-trained VGG19 model on ImageNet
  • Table11: Median 2 perturbations for decision-based Boundary Attacks against pre-trained VGG19 model on ImageNet
  • Table12: Query counts and 2 perturbations for SimBA-ODS attacks with various sets of surrogate models. In the column of surrogate models, R:ResNet34, D:DenseNet121, V:VGG19, M:MobileNetV2
  • Table13: Median 2 perturbations for Boundary-ODS attacks with various sets of surrogate models. In the column of surrogate models, R:ResNet34, D:DenseNet121, V:VGG19, M:MobileNetV2
  • Table14: Median 2 perturbations for Boundary-ODS attacks with different number of surrogate models against out-of-distribution images on ImageNet
  • Table15: Query counts and 2 perturbations for SimBA-ODS attacks with surrogate models trained with OOD images on ImageNet
  • Table16: Comparison of model performance under attacks with MultiTargeted (MT) and ODI. The values are model accuracy (lower is better) for PGD and the average of the minimum 2 perturbations (lower is better) for C&W. All results are the average of three trials
  • Table17: Median 2 perturbations for Boundary Attack with ODS and MultiTargeted (MT)
Download tables as Excel
Related work
  • ODS utilizes the output diversity on target models. A related work in the context is the white-box MultiTargeted attack [16]. The attack changes the target class of attacks per restarts, and it can be regarded as a method which aims to obtain diversified attack results. However, there are several differences between MultiTargeted and ODS. First, while MultiTargeted only focuses on p-bounded white-box attacks, ODS is developed for general white- and black-box attacks. In addition, since ODS does not require the original class of target images, ODS gives broader application than MultiTargeted. Furthermore, because the diversity provided by MultiTargeted is restricted to away from the original class, ODS can achieve better results for initialization and sampling than MultiTargeted. We give further discussion in Section E of the Appendix.
Funding
  • This research was supported in part by AFOSR (FA9550-19-1-0024), NSF (#1651565, #1522054, #1733686), ONR, and FLI
Reference
  • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
    Findings
  • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
    Google ScholarLocate open access versionFindings
  • Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Andrew Slavin Ross and Finale Doshi-Velez. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Haichao Zhang and Jianyu Wang. Iadversarially robust training through structured gradient regularization. In Advances in Neural Information Processing Systems, 2018.
    Google ScholarLocate open access versionFindings
  • Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, and Pascal Frossard. Robustness via curvature regularization, and vice versa. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, and Pushmeet Kohli. Adversarial robustness through local linearization. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Eric Wong and J Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, 2017.
    Google ScholarLocate open access versionFindings
  • Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Certified defenses against adversarial examples. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, 2019.
    Google ScholarLocate open access versionFindings
  • Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Tianhang Zheng, Changyou Chen, and Kui Ren. Distributionally adversarial attack. In AAAI Conference on Artificial Intelligence, 2019.
    Google ScholarLocate open access versionFindings
  • Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. arXiv preprint arXiv:1907.02044, 2019.
    Findings
  • Sven Gowal, Jonathan Uesato, Chongli Qin, Po-Sen Huang, Timothy Mann, and Pushmeet Kohli. An alternative surrogate loss for pgd-based adversarial testing. arXiv preprint arXiv:1910.09338, 2019.
    Findings
  • Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Chuan Guo, Jacob R. Gardner, Yurong You, Andrew G. Wilson, and Kilian Q. Weinberger. Simple black-box adversarial attacks. In International Conference on Machine Learning, 2019.
    Google ScholarLocate open access versionFindings
  • Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efficient black-box adversarial attack via random search. arXiv preprint arXiv:1912.00049, 2019.
    Findings
  • Jianbo Chen, Michael I Jordan, and Martin J Wainwright. HopSkipJumpAttack: a queryefficient decision-based adversarial attack. In IEEE Symposium on Security and Privacy (SP), 2020.
    Google ScholarLocate open access versionFindings
  • Minhao Cheng, Simranjit Singh, Patrick H. Chen, Pin-Yu Chen, Sijia Liu, and Cho-Jui Hsieh. Sign-opt: A query-efficient hard-label adversarial attack. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Jiawei Du, Hu Zhang, Joey Tianyi Zhou, Yi Yang, and Jiashi Feng. Query-efficient meta attack to deep neural networks. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Zhichao Huang and Tong Zhang. Black-box adversarial attack with transferable model-based embedding. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Yiwen Guo, Ziang Yan, and Changshui Zhang. Subspace attack: Exploiting promising subspaces for query-efficient black-box attacks. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Improving black-box adversarial attacks with a transfer-based prior. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
    Google ScholarLocate open access versionFindings
  • Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy, 2017.
    Google ScholarLocate open access versionFindings
  • Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
    Google ScholarLocate open access versionFindings
  • Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2017.
    Google ScholarLocate open access versionFindings
  • Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox: A python toolbox to benchmark the robustness of machine learning models. In Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning, 2017.
    Google ScholarLocate open access versionFindings
  • Maria-Irina Nicolae, Mathieu Sinn, Minh Ngoc Tran, Beat Buesser, Ambrish Rawat, Martin Wistuba, Valentina Zantedeschi, Nathalie Baracaldo, Bryant Chen, Heiko Ludwig, Ian Molloy, and Ben Edwards. Adversarial robustness toolbox v1.2.0. arXiv preprint arXiv:1807.01069, 2018.
    Findings
  • Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. In Asia Conference on Computer and Communications Security, 2017.
    Google ScholarLocate open access versionFindings
  • Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Thomas Brunner, Frederik Diehl, Michael Truong Le, and Alois Knoll. Guessing smart: Biased sampling for efficient black-box adversarial attacks. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
    Google ScholarLocate open access versionFindings
  • Jinghui Cai, Boyang Wang, Xiangfeng Wang, and Bo Jin. Accelerate black-box attack with white-box prior knowledge. In International Conference on Intelligent Science and Big Data Engineering, 2020.
    Google ScholarLocate open access versionFindings
  • Fnu Suya, Jianfeng Chi, David Evans, and Yuan Tian. Hybrid batch attacks: Finding black-box adversarial examples with limited queries. In 29th USENIX Security Symposium (USENIX Security 20), 2020.
    Google ScholarLocate open access versionFindings
  • Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk and the dangers of evaluating against weak attacks. In International Conference on Machine Learning, 2018.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. A method for stochastic optimization. In International Conference on Learning Representations, 2015.
    Google ScholarLocate open access versionFindings
  • Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9, Nov 2008.
    Google ScholarLocate open access versionFindings
  • Jonathan Uesato, Jean-Baptiste Alayrac, Po-Sen Huang, Robert Stanforth, Alhussein Fawzi, and Pushmeet Kohli. Are labels required for improving adversarial robustness? In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C. Duchi. Unlabeled data improves adversarial robustness. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Haichao Zhang and Jianyu Wang. Defense against adversarial attacks using feature scatteringbased adversarial training. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, and Baishakhi Ray. Metric learning for adversarial robustness. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, and Bin Dong. You only propagate once: Accelerating adversarial training via maximal principle. In Advances in Neural Information Processing Systems, 2019.
    Google ScholarLocate open access versionFindings
  • [44] Metric learning
    Google ScholarFindings
  • [17] Boundary-ODS Boundary-MT
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments