Patch-wise Attack for Fooling Deep Neural Network
european conference on computer vision, pp. 307-322, 2020.
Weibo:
Abstract:
By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models. Features of a pixel extracted by deep neural networks (DNNs) are influenced by its surrounding regions, and different DNNs generally focus on different discriminative regions in recognition. Motivated by this, we propos...More
Code:
Data:
Introduction
- Deep neural networks (DNNs) [9,10,30,31,16,15] have made great achievements.
- One of the most popular branches is gradient-based algorithms
- For this branch, existing methods can be generally classified as single-step attacks and iterative attacks.
- In the real world, attackers usually cannot get any information about the target model, which is called the black-box setting
- In this case, single-step attacks always have a higher transferability than iterative attacks at the cost of poor performance of substitute models.
- While single-step attack methods only update once, which is easy to underfit but really improve the generalizability
Highlights
- In recent years, Deep neural networks (DNNs) [9,10,30,31,16,15] have made great achievements
- Our major contributions can be summarized as: 1) We propose a novel patch-wise attack idea named Patch-wise Iterative Fast Gradient Sign Method (PI-Fast Gradient Sign Method (FGSM))
- Our method can be generally integrated to any iteration-based attack methods; and 3) Extensive experiments on ImageNet show that our method significantly outperforms the state-of-the-art methods, and improves the success rate by 9.2% for defense models and 3.7% for normally trained models on average in the black-box setting
- To sum up, compared with other attacks, our PI-FGSM can improve the success rate by 3.7% on average, and when we attack against Dense-1615, transferability can be increased by up to 17.2%
- We propose a novel patch-wise iterative algorithm – a black-box attack towards mainstream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise
- Our approach can serve as a baseline to help generating more transferable adversarial examples and evaluating the robustness of various deep neural networks
Methods
- The authors observe that adding noise in a patch-wise style will have better transferability than the pixel-wise style.
- To the best of the knowledge, many recent iterative attack methods [2,3,37] set step size α = /T , where T is the total number of iterations
- In such a setting, the authors do not need the element-wise clipping operation, and the adversarial examples can reach the bound of xclean.
- To study the transferability with respect to the step size setting, the authors make a tradeoff between single large step and iterative small step by setting it to /T × β, where β is an amplification factor
Results
- Compared with the current state-of-the-art attacks, the authors significantly improve the success rate by 9.2\% for defense models and 3.7\% for normally trained models on average.
- The authors' method can be generally integrated to any iteration-based attack methods; and 3) Extensive experiments on ImageNet show that the method significantly outperforms the state-of-the-art methods, and improves the success rate by 9.2% for defense models and 3.7% for normally trained models on average in the black-box setting.
- Compared with MI-FGSM, the proposed PI-FGSM improves the performance by about 9.6% on average
Conclusion
- The authors propose a novel patch-wise iterative algorithm – a black-box attack towards mainstream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise.
- With this approach, the adversarial perturbation patches in discriminative regions will be larger, generating more transferable adversarial examples against both normally trained and defense models.
- The authors' approach can serve as a baseline to help generating more transferable adversarial examples and evaluating the robustness of various deep neural networks
Summary
Introduction:
Deep neural networks (DNNs) [9,10,30,31,16,15] have made great achievements.- One of the most popular branches is gradient-based algorithms
- For this branch, existing methods can be generally classified as single-step attacks and iterative attacks.
- In the real world, attackers usually cannot get any information about the target model, which is called the black-box setting
- In this case, single-step attacks always have a higher transferability than iterative attacks at the cost of poor performance of substitute models.
- While single-step attack methods only update once, which is easy to underfit but really improve the generalizability
Objectives:
The authors' goal is crafting efficient patch-wise noise to improve the transferability of adversarial examples in the black-box setting.- The authors' goal is to solve the following constrained optimization problem:
Methods:
The authors observe that adding noise in a patch-wise style will have better transferability than the pixel-wise style.- To the best of the knowledge, many recent iterative attack methods [2,3,37] set step size α = /T , where T is the total number of iterations
- In such a setting, the authors do not need the element-wise clipping operation, and the adversarial examples can reach the bound of xclean.
- To study the transferability with respect to the step size setting, the authors make a tradeoff between single large step and iterative small step by setting it to /T × β, where β is an amplification factor
Results:
Compared with the current state-of-the-art attacks, the authors significantly improve the success rate by 9.2\% for defense models and 3.7\% for normally trained models on average.- The authors' method can be generally integrated to any iteration-based attack methods; and 3) Extensive experiments on ImageNet show that the method significantly outperforms the state-of-the-art methods, and improves the success rate by 9.2% for defense models and 3.7% for normally trained models on average in the black-box setting.
- Compared with MI-FGSM, the proposed PI-FGSM improves the performance by about 9.6% on average
Conclusion:
The authors propose a novel patch-wise iterative algorithm – a black-box attack towards mainstream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise.- With this approach, the adversarial perturbation patches in discriminative regions will be larger, generating more transferable adversarial examples against both normally trained and defense models.
- The authors' approach can serve as a baseline to help generating more transferable adversarial examples and evaluating the robustness of various deep neural networks
Tables
- Table1: The success rate(%) of non-targeted attacks against NT. The leftmost column models are substitute models (“*” indicates white-box attack), the adversarial examples are crafted for them by FGSM, I-FGSM, MI-FGSM, DI-FGSM, PI-FGSM, and their combined versions respectively
- Table2: The success rate(%) of non-targeted attacks against EAT. The top row models are substitute models, we use them to generate adversarial examples by FGSM, I-FGSM, DI-FGSM, MI-FGSM, TI-FGSM, PI-FGSM and their combined versions respectively
- Table3: The success rate(%) of non-targeted attacks. We use an ensemble of Incv3, Inc-v4, Res-152, and IncRes-v2 to generate our adversarial examples by FGSM, I-FGSM, MI-FGSM, DI-FGSM, TI-FGSM, PI-FGSM, and their combined versions respectively
- Table4: The average success rate(%) of non-targeted attacks. The top row models are substitute models (“*” indicates white-box attack). We use ResNeXtDA, Res152B, Res152D and an ensemble of them to generate adversarial examples by FGSM, I-FGSM, DI-FGSM, TI-FGSM, MI-FGSM, PI-FGSM, and their combined versions respectively
- Table5: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Inc-v3 to generate adversarial examples by PI-FGSM
- Table6: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Inc-v3 to generate adversarial examples by DPI-FGSM
- Table7: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Inc-v3 to generate adversarial examples by MPI-FGSM
- Table8: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Inc-v3 to generate adversarial examples by TPI-FGSM
- Table9: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Inc-v3 to generate adversarial examples by DMPI-FGSM
- Table10: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Res152B (“*” indicates white-box attacks) to generate adversarial examples by PI-FGSM
- Table11: The average success rate(%) of non-targeted attacks in different iteration T against NT. Here we use Inc-v3 to generate adversarial examples by PI-FGSM
- Table12: The average success rate(%) of non-targeted attacks in different iteration T against EAT. Here we use Inc-v3 to generate adversarial examples by PI-FGSM
- Table13: The average success rate(%) of non-targeted attacks in different iteration T against FD. Here we use Res152B (“*” indicates white-box attacks) to generate adversarial examples by PI-FGSM
Related work
- In this section, we briefly analyze the exiting adversarial attack methods, from the perspectives of classification of adversarial examples, attack setting, and ensemble strategy.
2.1 Adversarial Examples
Adversarial examples are first discovered by Szegedy et al.[32], which only added subtle perturbation to the original image but can mislead the DNNs to make an unreasonable prediction with unbelievably high confidence. To make matters worse, adversarial examples also exist in physical world [6,12,13], which raises security concerns about DNNs. Due to the vulnerability of DNNs, a large number of attack methods have been proposed and applied to various fields of deep learning in recent years, e.g., object detection and semantic segmentation [35], embodied agents [19], and speech recognition [1]. To make our paper more focused, we only analyze adversarial examples in the image classification task.
2.2 Attack Settings
In this section, we describe three common attack settings. The first is the whitebox setting where the adversary can get the full knowledge of the targeted models, thus obtaining accurate gradient information to update adversarial examples. The second is the semi-black-box setting where the output of the targeted model is available but model parameters are still unknown. For example, Papernot et al [24] train a local model with many queries to substitute for the target model. Ilyas et al [11] propose the variant of NES [27] to generate adversarial examples with limited queries. The rest is the black-box setting, where the adversary generally cannot access the target model and adversarial examples are usually generated for substitute models without exception. That is why transferability plays a key role in this setting. Recently, the black-box attack is a hot topic and many excellent works have been proposed. Xie et al [37] apply random transformations to the input images at each iteration to improve transferability. Dong et al [2] propose a momentum-based iterative algorithm to boost adversarial attacks. Besides, the adversarial examples which are crafted by their translation-invariant attack method [3] can evade the defenses with effect. However, the above works cannot generate powerful patch-wise noise because they generally take valid gradient information into account. In this paper, our goal is crafting efficient patch-wise noise to improve the transferability of adversarial examples in the black-box setting.
Funding
- This work is supported by the Fundamental Research Funds for the Central Universities (Grant No ZYGX2019J073), the National Natural Science Foundation of China (Grant No 61772116, No 61872064, No.61632007, No 61602049), The Open Project of Zhejiang Lab (Grant No.2019KD0AB05)
Reference
- Cisse, M., Adi, Y., Neverova, N., Keshet, J.: Houdini: Fooling deep structured prediction models. CoRR abs/1707.05373 (2017) 4
- Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., Li, J.: Boosting adversarial attacks with momentum. In: CVPR (2018) 4, 5, 7, 9
- Dong, Y., Pang, T., Su, H., Zhu, J.: Evading defenses to transferable adversarial examples by translation-invariant attacks. In: CVPR (2019) 2, 4, 5, 6, 7, 9, 11, 13
- Dziugaite, G.K., Ghahramani, Z., Roy, D.M.: A study of the effect of JPG compression on adversarial images. CoRR abs/1608.00853 (2016) 2
- Efros, A.A., Freeman, W.T.: Image quilting for texture synthesis and transfer. In: SIGGRAPH (2001) 2
- Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., Prakash, A., Kohno, T., Song, D.: Robust physical-world attacks on deep learning visual classification. In: CVPR (2018) 1, 2, 4
- Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015) 2, 3, 5, 7
- Guo, C., Rana, M., Cisse, M., van der Maaten, L.: Countering adversarial images using input transformations. In: ICLR (2018) 2
- He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) 1, 9, 17
- Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017) 1, 3, 9, 17
- Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML (2018) 4
- Komkov, S., Petiushko, A.: Advhat: Real-world adversarial attack on arcface face id system. CoRR abs/1908.08705 (2019) 2, 4
- Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: ICLR (2017) 4
- Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial machine learning at scale. In: ICLR (2017) 2, 3, 5
- Li, X., Gao, L., Wang, X., Liu, W., Xu, X., Shen, H.T., Song, J.: Learnable aggregating net with diversity learning for video question answering. In: Proceedings of the 27th ACM International Conference on Multimedia. pp. 1166–1174 (2019) 1
- Li, X., Song, J., Gao, L., Liu, X., Huang, W., He, X., Gan, C.: Beyond rnns: Positional self-attention with co-attention for video question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 8658–8665 (2019) 1
- Li, Y., Bai, S., Xie, C., Liao, Z., Shen, X., Yuille, A.L.: Regional homogeneity: Towards learning transferable universal adversarial perturbations against defenses. CoRR abs/1904.00979 (2019) 6
- Lin, J., Gan, C., Han, S.: Defensive quantization: When efficiency meets robustness. In: ICLR (2019) 2
- Liu, A., Huang, T., Liu, X., Xu, Y., Ma, Y., Chen, X., Maybank, S., Tao, D.: Spatiotemporal attacks for embodied agents. In: ECCV (2020) 4
- Liu, A., Wang, J., Liu, X., Cao, b., Zhang, C., Yu, H.: Bias-based universal adversarial patch attack for automatic check-out. In: ECCV (2020) 2
- Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarial examples and black-box attacks. In: ICLR (2017) 4, 13
- Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015) 6
- Moosavi-Dezfooli, S., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: CVPR (2017) 4, 7
- Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Karri, R., Sinanoglu, O., Sadeghi, A., Yi, X. (eds.) AsiaCCS (2017) 4
- Rosen, J.: The gradient projection method for nonlinear programming. part i. linear constraints. Journal of The Society for Industrial and Applied Mathematics 8, 181–217 (1960) 7
- Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60(1-4), 259–268 (1992) 2
- Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2017) 4
- Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Gradcam: Visual explanations from deep networks via gradient-based localization. In: ICCV (2017) 3
- Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In: SIGSAC (2016) 2
- Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI (2017) 1, 9, 17
- Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016) 1, 3, 9, 17
- Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R.: Intriguing properties of neural networks. In: ICLR (2014) 1, 4
- Thys, S., Ranst, W.V., Goedeme, T.: Fooling automated surveillance cameras: adversarial patches to attack person detection. In: CVPR Workshops (2019) 2
- Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I.J., Boneh, D., McDaniel, P.D.: Ensemble adversarial training: attacks and defenses. In: ICLR (2018) 2, 9, 12, 17
- Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., Yuille, A.L.: Adversarial examples for semantic segmentation and object detection. In: ICCV (2017) 4
- Xie, C., Wu, Y., van der Maaten, L., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: CVPR (2019) 2, 9, 12, 17
- Xie, C., Zhang, Z., Zhou, Y., Bai, S., Wang, J., Ren, Z., Yuille, A.L.: Improving transferability of adversarial examples with input diversity. In: CVPR (2019) 4, 5, 6, 7, 9
- Xu, K., Liu, S., Zhang, G., Sun, M., Zhao, P., Fan, Q., Gan, C., Lin, X.: Interpreting adversarial examples by activation promotion and suppression. CoRR abs/1904.02057 (2019) 1
- Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016) 2
Full Text
Tags
Comments