Diversity can be Transferred: Output Diversification for White- and Black-box Attacks
NIPS 2020, 2020.
EI
Weibo:
Abstract:
Adversarial attacks often involve random perturbations of the inputs drawn from uniform or Gaussian distributions, e.g. to initialize optimization-based white-box attacks or generate update directions in black-box attacks. These simple perturbations, however, could be suboptimal as they are agnostic to the model being attacked. To improve...More
Introduction
- Deep neural networks have achieved great success in image classification. it is known that they are vulnerable to adversarial examples [1] — small perturbations imperceptible to humans that cause classifiers to output wrong predictions.
- Some black-box attack methods [17, 18] use random sampling to explore update directions for finding or improving adversarial examples.
- In these attacks, random perturbations are typically sampled from a naïve uniform or Gaussian distribution in the input pixel space
Highlights
- Deep neural networks have achieved great success in image classification
- Attack performances with ODS for initialization (ODI) are better than naïve initialization for all models and attacks
- Our tuned ODI-Projected Gradient Descent (PGD) reduces the accuracy to 88.12% for the MNIST model, and to 44.00% for the CIFAR-10 model
- Simple Black-box Attack (SimBA)-Output Diversified Sampling (ODS) remarkably reduces the average queries by a factor ranging between 2 and 3 compared to SimBA-discrete cosine transform (DCT) in both untargeted and targeted settings
- ODS for black-box attacks is applicable even if surrogate models are trained with out-of-distribution dataset from target images, so black-box attacks with ODS are more practical than other black-box attacks using prior knowledge of surrogate models
- We demonstrate that a simple combination of PGD and ODI achieves new state-of-the-art attack success rates
- While we only focus on ODS with surrogate models trained with labeled dataset, we believe that ODS works well even if we only have unlabeled dataset
Results
- The authors summarize all quantitative results in Table 1.
- Setup One state-of-the-art attack the authors compare with is the well-tuned PGD attack [16], which achieved 88.21% accuracy for the robust MNIST model.
- The authors' tuned ODI-PGD reduces the accuracy to 88.12% for the MNIST model, and to 44.00% for the CIFAR-10 model
- These results outperform existing state-of-the-art attacks.
- In Table 2, the computational cost of tuned ODI-PGD is smaller than that of state-of-the-art attacks, and especially 50 times smaller on the CIFAR-10 model.
- These results indicate that ODI-PGD might be a better benchmark for comparing and evaluating different defense methods, rather than naïve-PGD and PGD1.
Conclusion
- The authors propose ODS, a novel sampling strategy for white- and black-box attacks.
- By generating more diverse perturbations as measured in the output space, ODS brings more effective starting points for white-box attacks.
- Leveraging surrogate models, ODS improves the exploration of the output space for black-box attacks.
- While the authors only focus on ODS with surrogate models trained with labeled dataset, the authors believe that ODS works well even if the authors only have unlabeled dataset.
Summary
Introduction:
Deep neural networks have achieved great success in image classification. it is known that they are vulnerable to adversarial examples [1] — small perturbations imperceptible to humans that cause classifiers to output wrong predictions.- Some black-box attack methods [17, 18] use random sampling to explore update directions for finding or improving adversarial examples.
- In these attacks, random perturbations are typically sampled from a naïve uniform or Gaussian distribution in the input pixel space
Results:
The authors summarize all quantitative results in Table 1.- Setup One state-of-the-art attack the authors compare with is the well-tuned PGD attack [16], which achieved 88.21% accuracy for the robust MNIST model.
- The authors' tuned ODI-PGD reduces the accuracy to 88.12% for the MNIST model, and to 44.00% for the CIFAR-10 model
- These results outperform existing state-of-the-art attacks.
- In Table 2, the computational cost of tuned ODI-PGD is smaller than that of state-of-the-art attacks, and especially 50 times smaller on the CIFAR-10 model.
- These results indicate that ODI-PGD might be a better benchmark for comparing and evaluating different defense methods, rather than naïve-PGD and PGD1.
Conclusion:
The authors propose ODS, a novel sampling strategy for white- and black-box attacks.- By generating more diverse perturbations as measured in the output space, ODS brings more effective starting points for white-box attacks.
- Leveraging surrogate models, ODS improves the exploration of the output space for black-box attacks.
- While the authors only focus on ODS with surrogate models trained with labeled dataset, the authors believe that ODS works well even if the authors only have unlabeled dataset.
Tables
- Table1: Model performance under attacks with ODI. The values are model accuracy (lower is better) for PGD and the average of the minimum 2 perturbations (lower is better) for C&W. All results are the average of three trials
- Table2: Comparison of ODI-PGD with state-of-the-art attacks against pre-trained defense models. The complexity rows display products of the number of steps and restarts. Results for ODI-PGD are the average of three trials. For ODI-PGD, the number of steps is the sum of ODS and PGD steps
- Table3: Number of queries and 2 perturbations for score-based attacks
- Table4: Number of queries for SimBA-ODS and 2 score-based state-of-the-art attacks with norm bound
- Table5: Median 2 perturbations for Boundary-ODS and decision-based state-of-the-art attacks
- Table6: Median 2 perturbations for Boundary-ODS with surrogate models trained with OOD images
- Table7: Hyperparameter setting for tuned ODI-PGD in Section 4.2
- Table8: The sensitivity to the number of ODI steps NODI and step size ηODI . We repeat each experiment 5 times to calculate statistics
- Table9: Accuracy of models after performing ODI-PGD and naïve-PGD attacks against recently proposed defense models
- Table10: Query counts and 2 perturbations for score-based Simple Black-box Attacks (SimBA) against pre-trained VGG19 model on ImageNet
- Table11: Median 2 perturbations for decision-based Boundary Attacks against pre-trained VGG19 model on ImageNet
- Table12: Query counts and 2 perturbations for SimBA-ODS attacks with various sets of surrogate models. In the column of surrogate models, R:ResNet34, D:DenseNet121, V:VGG19, M:MobileNetV2
- Table13: Median 2 perturbations for Boundary-ODS attacks with various sets of surrogate models. In the column of surrogate models, R:ResNet34, D:DenseNet121, V:VGG19, M:MobileNetV2
- Table14: Median 2 perturbations for Boundary-ODS attacks with different number of surrogate models against out-of-distribution images on ImageNet
- Table15: Query counts and 2 perturbations for SimBA-ODS attacks with surrogate models trained with OOD images on ImageNet
- Table16: Comparison of model performance under attacks with MultiTargeted (MT) and ODI. The values are model accuracy (lower is better) for PGD and the average of the minimum 2 perturbations (lower is better) for C&W. All results are the average of three trials
- Table17: Median 2 perturbations for Boundary Attack with ODS and MultiTargeted (MT)
Related work
- ODS utilizes the output diversity on target models. A related work in the context is the white-box MultiTargeted attack [16]. The attack changes the target class of attacks per restarts, and it can be regarded as a method which aims to obtain diversified attack results. However, there are several differences between MultiTargeted and ODS. First, while MultiTargeted only focuses on p-bounded white-box attacks, ODS is developed for general white- and black-box attacks. In addition, since ODS does not require the original class of target images, ODS gives broader application than MultiTargeted. Furthermore, because the diversity provided by MultiTargeted is restricted to away from the original class, ODS can achieve better results for initialization and sampling than MultiTargeted. We give further discussion in Section E of the Appendix.
Funding
- This research was supported in part by AFOSR (FA9550-19-1-0024), NSF (#1651565, #1522054, #1733686), ONR, and FLI
Reference
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
- Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In International Conference on Learning Representations, 2018.
- Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018.
- Andrew Slavin Ross and Finale Doshi-Velez. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In AAAI Conference on Artificial Intelligence, 2018.
- Haichao Zhang and Jianyu Wang. Iadversarially robust training through structured gradient regularization. In Advances in Neural Information Processing Systems, 2018.
- Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, and Pascal Frossard. Robustness via curvature regularization, and vice versa. In International Conference on Learning Representations, 2019.
- Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, and Pushmeet Kohli. Adversarial robustness through local linearization. In Advances in Neural Information Processing Systems, 2019.
- Eric Wong and J Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, 2017.
- Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Certified defenses against adversarial examples. In International Conference on Learning Representations, 2018.
- Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, 2019.
- Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017.
- Tianhang Zheng, Changyou Chen, and Kui Ren. Distributionally adversarial attack. In AAAI Conference on Artificial Intelligence, 2019.
- Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. arXiv preprint arXiv:1907.02044, 2019.
- Sven Gowal, Jonathan Uesato, Chongli Qin, Po-Sen Huang, Timothy Mann, and Pushmeet Kohli. An alternative surrogate loss for pgd-based adversarial testing. arXiv preprint arXiv:1910.09338, 2019.
- Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations, 2018.
- Chuan Guo, Jacob R. Gardner, Yurong You, Andrew G. Wilson, and Kilian Q. Weinberger. Simple black-box adversarial attacks. In International Conference on Machine Learning, 2019.
- Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efficient black-box adversarial attack via random search. arXiv preprint arXiv:1912.00049, 2019.
- Jianbo Chen, Michael I Jordan, and Martin J Wainwright. HopSkipJumpAttack: a queryefficient decision-based adversarial attack. In IEEE Symposium on Security and Privacy (SP), 2020.
- Minhao Cheng, Simranjit Singh, Patrick H. Chen, Pin-Yu Chen, Sijia Liu, and Cho-Jui Hsieh. Sign-opt: A query-efficient hard-label adversarial attack. In International Conference on Learning Representations, 2020.
- Jiawei Du, Hu Zhang, Joey Tianyi Zhou, Yi Yang, and Jiashi Feng. Query-efficient meta attack to deep neural networks. In International Conference on Learning Representations, 2020.
- Zhichao Huang and Tong Zhang. Black-box adversarial attack with transferable model-based embedding. In International Conference on Learning Representations, 2020.
- Yiwen Guo, Ziang Yan, and Changshui Zhang. Subspace attack: Exploiting promising subspaces for query-efficient black-box attacks. In Advances in Neural Information Processing Systems, 2019.
- Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Improving black-box adversarial attacks with a transfer-based prior. In Advances in Neural Information Processing Systems, 2019.
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
- Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy, 2017.
- Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
- Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In International Conference on Learning Representations, 2018.
- Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox: A python toolbox to benchmark the robustness of machine learning models. In Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning, 2017.
- Maria-Irina Nicolae, Mathieu Sinn, Minh Ngoc Tran, Beat Buesser, Ambrish Rawat, Martin Wistuba, Valentina Zantedeschi, Nathalie Baracaldo, Bryant Chen, Heiko Ludwig, Ian Molloy, and Ben Edwards. Adversarial robustness toolbox v1.2.0. arXiv preprint arXiv:1807.01069, 2018.
- Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. In Asia Conference on Computer and Communications Security, 2017.
- Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations, 2017.
- Thomas Brunner, Frederik Diehl, Michael Truong Le, and Alois Knoll. Guessing smart: Biased sampling for efficient black-box adversarial attacks. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
- Jinghui Cai, Boyang Wang, Xiangfeng Wang, and Bo Jin. Accelerate black-box attack with white-box prior knowledge. In International Conference on Intelligent Science and Big Data Engineering, 2020.
- Fnu Suya, Jianfeng Chi, David Evans, and Yuan Tian. Hybrid batch attacks: Finding black-box adversarial examples with limited queries. In 29th USENIX Security Symposium (USENIX Security 20), 2020.
- Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk and the dangers of evaluating against weak attacks. In International Conference on Machine Learning, 2018.
- Diederik P. Kingma and Jimmy Ba. A method for stochastic optimization. In International Conference on Learning Representations, 2015.
- Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9, Nov 2008.
- Jonathan Uesato, Jean-Baptiste Alayrac, Po-Sen Huang, Robert Stanforth, Alhussein Fawzi, and Pushmeet Kohli. Are labels required for improving adversarial robustness? In Advances in Neural Information Processing Systems, 2019.
- Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C. Duchi. Unlabeled data improves adversarial robustness. In Advances in Neural Information Processing Systems, 2019.
- Haichao Zhang and Jianyu Wang. Defense against adversarial attacks using feature scatteringbased adversarial training. In Advances in Neural Information Processing Systems, 2019.
- Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, and Baishakhi Ray. Metric learning for adversarial robustness. In Advances in Neural Information Processing Systems, 2019.
- Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! In Advances in Neural Information Processing Systems, 2019.
- Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, and Bin Dong. You only propagate once: Accelerating adversarial training via maximal principle. In Advances in Neural Information Processing Systems, 2019.
- [44] Metric learning
- [17] Boundary-ODS Boundary-MT
Full Text
Tags
Comments