Patch-wise Attack for Fooling Deep Neural Network

european conference on computer vision, pp. 307-322, 2020.

Cited by: 0|Bibtex|Views110
Other Links: arxiv.org|academic.microsoft.com
Weibo:
We propose a novel patch-wise iterative algorithm – a black-box attack towards mainstream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise

Abstract:

By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models. Features of a pixel extracted by deep neural networks (DNNs) are influenced by its surrounding regions, and different DNNs generally focus on different discriminative regions in recognition. Motivated by this, we propos...More

Code:

Data:

0
Introduction
  • Deep neural networks (DNNs) [9,10,30,31,16,15] have made great achievements.
  • One of the most popular branches is gradient-based algorithms
  • For this branch, existing methods can be generally classified as single-step attacks and iterative attacks.
  • In the real world, attackers usually cannot get any information about the target model, which is called the black-box setting
  • In this case, single-step attacks always have a higher transferability than iterative attacks at the cost of poor performance of substitute models.
  • While single-step attack methods only update once, which is easy to underfit but really improve the generalizability
Highlights
  • In recent years, Deep neural networks (DNNs) [9,10,30,31,16,15] have made great achievements
  • Our major contributions can be summarized as: 1) We propose a novel patch-wise attack idea named Patch-wise Iterative Fast Gradient Sign Method (PI-Fast Gradient Sign Method (FGSM))
  • Our method can be generally integrated to any iteration-based attack methods; and 3) Extensive experiments on ImageNet show that our method significantly outperforms the state-of-the-art methods, and improves the success rate by 9.2% for defense models and 3.7% for normally trained models on average in the black-box setting
  • To sum up, compared with other attacks, our PI-FGSM can improve the success rate by 3.7% on average, and when we attack against Dense-1615, transferability can be increased by up to 17.2%
  • We propose a novel patch-wise iterative algorithm – a black-box attack towards mainstream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise
  • Our approach can serve as a baseline to help generating more transferable adversarial examples and evaluating the robustness of various deep neural networks
Methods
  • The authors observe that adding noise in a patch-wise style will have better transferability than the pixel-wise style.
  • To the best of the knowledge, many recent iterative attack methods [2,3,37] set step size α = /T , where T is the total number of iterations
  • In such a setting, the authors do not need the element-wise clipping operation, and the adversarial examples can reach the bound of xclean.
  • To study the transferability with respect to the step size setting, the authors make a tradeoff between single large step and iterative small step by setting it to /T × β, where β is an amplification factor
Results
  • Compared with the current state-of-the-art attacks, the authors significantly improve the success rate by 9.2\% for defense models and 3.7\% for normally trained models on average.
  • The authors' method can be generally integrated to any iteration-based attack methods; and 3) Extensive experiments on ImageNet show that the method significantly outperforms the state-of-the-art methods, and improves the success rate by 9.2% for defense models and 3.7% for normally trained models on average in the black-box setting.
  • Compared with MI-FGSM, the proposed PI-FGSM improves the performance by about 9.6% on average
Conclusion
  • The authors propose a novel patch-wise iterative algorithm – a black-box attack towards mainstream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise.
  • With this approach, the adversarial perturbation patches in discriminative regions will be larger, generating more transferable adversarial examples against both normally trained and defense models.
  • The authors' approach can serve as a baseline to help generating more transferable adversarial examples and evaluating the robustness of various deep neural networks
Summary
  • Introduction:

    Deep neural networks (DNNs) [9,10,30,31,16,15] have made great achievements.
  • One of the most popular branches is gradient-based algorithms
  • For this branch, existing methods can be generally classified as single-step attacks and iterative attacks.
  • In the real world, attackers usually cannot get any information about the target model, which is called the black-box setting
  • In this case, single-step attacks always have a higher transferability than iterative attacks at the cost of poor performance of substitute models.
  • While single-step attack methods only update once, which is easy to underfit but really improve the generalizability
  • Objectives:

    The authors' goal is crafting efficient patch-wise noise to improve the transferability of adversarial examples in the black-box setting.
  • The authors' goal is to solve the following constrained optimization problem:
  • Methods:

    The authors observe that adding noise in a patch-wise style will have better transferability than the pixel-wise style.
  • To the best of the knowledge, many recent iterative attack methods [2,3,37] set step size α = /T , where T is the total number of iterations
  • In such a setting, the authors do not need the element-wise clipping operation, and the adversarial examples can reach the bound of xclean.
  • To study the transferability with respect to the step size setting, the authors make a tradeoff between single large step and iterative small step by setting it to /T × β, where β is an amplification factor
  • Results:

    Compared with the current state-of-the-art attacks, the authors significantly improve the success rate by 9.2\% for defense models and 3.7\% for normally trained models on average.
  • The authors' method can be generally integrated to any iteration-based attack methods; and 3) Extensive experiments on ImageNet show that the method significantly outperforms the state-of-the-art methods, and improves the success rate by 9.2% for defense models and 3.7% for normally trained models on average in the black-box setting.
  • Compared with MI-FGSM, the proposed PI-FGSM improves the performance by about 9.6% on average
  • Conclusion:

    The authors propose a novel patch-wise iterative algorithm – a black-box attack towards mainstream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise.
  • With this approach, the adversarial perturbation patches in discriminative regions will be larger, generating more transferable adversarial examples against both normally trained and defense models.
  • The authors' approach can serve as a baseline to help generating more transferable adversarial examples and evaluating the robustness of various deep neural networks
Tables
  • Table1: The success rate(%) of non-targeted attacks against NT. The leftmost column models are substitute models (“*” indicates white-box attack), the adversarial examples are crafted for them by FGSM, I-FGSM, MI-FGSM, DI-FGSM, PI-FGSM, and their combined versions respectively
  • Table2: The success rate(%) of non-targeted attacks against EAT. The top row models are substitute models, we use them to generate adversarial examples by FGSM, I-FGSM, DI-FGSM, MI-FGSM, TI-FGSM, PI-FGSM and their combined versions respectively
  • Table3: The success rate(%) of non-targeted attacks. We use an ensemble of Incv3, Inc-v4, Res-152, and IncRes-v2 to generate our adversarial examples by FGSM, I-FGSM, MI-FGSM, DI-FGSM, TI-FGSM, PI-FGSM, and their combined versions respectively
  • Table4: The average success rate(%) of non-targeted attacks. The top row models are substitute models (“*” indicates white-box attack). We use ResNeXtDA, Res152B, Res152D and an ensemble of them to generate adversarial examples by FGSM, I-FGSM, DI-FGSM, TI-FGSM, MI-FGSM, PI-FGSM, and their combined versions respectively
  • Table5: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Inc-v3 to generate adversarial examples by PI-FGSM
  • Table6: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Inc-v3 to generate adversarial examples by DPI-FGSM
  • Table7: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Inc-v3 to generate adversarial examples by MPI-FGSM
  • Table8: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Inc-v3 to generate adversarial examples by TPI-FGSM
  • Table9: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Inc-v3 to generate adversarial examples by DMPI-FGSM
  • Table10: The average success rate(%) of non-targeted attacks in different project factor γ settings. Here we use Res152B (“*” indicates white-box attacks) to generate adversarial examples by PI-FGSM
  • Table11: The average success rate(%) of non-targeted attacks in different iteration T against NT. Here we use Inc-v3 to generate adversarial examples by PI-FGSM
  • Table12: The average success rate(%) of non-targeted attacks in different iteration T against EAT. Here we use Inc-v3 to generate adversarial examples by PI-FGSM
  • Table13: The average success rate(%) of non-targeted attacks in different iteration T against FD. Here we use Res152B (“*” indicates white-box attacks) to generate adversarial examples by PI-FGSM
Download tables as Excel
Related work
  • In this section, we briefly analyze the exiting adversarial attack methods, from the perspectives of classification of adversarial examples, attack setting, and ensemble strategy.

    2.1 Adversarial Examples

    Adversarial examples are first discovered by Szegedy et al.[32], which only added subtle perturbation to the original image but can mislead the DNNs to make an unreasonable prediction with unbelievably high confidence. To make matters worse, adversarial examples also exist in physical world [6,12,13], which raises security concerns about DNNs. Due to the vulnerability of DNNs, a large number of attack methods have been proposed and applied to various fields of deep learning in recent years, e.g., object detection and semantic segmentation [35], embodied agents [19], and speech recognition [1]. To make our paper more focused, we only analyze adversarial examples in the image classification task.

    2.2 Attack Settings

    In this section, we describe three common attack settings. The first is the whitebox setting where the adversary can get the full knowledge of the targeted models, thus obtaining accurate gradient information to update adversarial examples. The second is the semi-black-box setting where the output of the targeted model is available but model parameters are still unknown. For example, Papernot et al [24] train a local model with many queries to substitute for the target model. Ilyas et al [11] propose the variant of NES [27] to generate adversarial examples with limited queries. The rest is the black-box setting, where the adversary generally cannot access the target model and adversarial examples are usually generated for substitute models without exception. That is why transferability plays a key role in this setting. Recently, the black-box attack is a hot topic and many excellent works have been proposed. Xie et al [37] apply random transformations to the input images at each iteration to improve transferability. Dong et al [2] propose a momentum-based iterative algorithm to boost adversarial attacks. Besides, the adversarial examples which are crafted by their translation-invariant attack method [3] can evade the defenses with effect. However, the above works cannot generate powerful patch-wise noise because they generally take valid gradient information into account. In this paper, our goal is crafting efficient patch-wise noise to improve the transferability of adversarial examples in the black-box setting.
Funding
  • This work is supported by the Fundamental Research Funds for the Central Universities (Grant No ZYGX2019J073), the National Natural Science Foundation of China (Grant No 61772116, No 61872064, No.61632007, No 61602049), The Open Project of Zhejiang Lab (Grant No.2019KD0AB05)
Reference
  • Cisse, M., Adi, Y., Neverova, N., Keshet, J.: Houdini: Fooling deep structured prediction models. CoRR abs/1707.05373 (2017) 4
    Findings
  • Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., Li, J.: Boosting adversarial attacks with momentum. In: CVPR (2018) 4, 5, 7, 9
    Google ScholarFindings
  • Dong, Y., Pang, T., Su, H., Zhu, J.: Evading defenses to transferable adversarial examples by translation-invariant attacks. In: CVPR (2019) 2, 4, 5, 6, 7, 9, 11, 13
    Google ScholarFindings
  • Dziugaite, G.K., Ghahramani, Z., Roy, D.M.: A study of the effect of JPG compression on adversarial images. CoRR abs/1608.00853 (2016) 2
    Findings
  • Efros, A.A., Freeman, W.T.: Image quilting for texture synthesis and transfer. In: SIGGRAPH (2001) 2
    Google ScholarFindings
  • Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., Prakash, A., Kohno, T., Song, D.: Robust physical-world attacks on deep learning visual classification. In: CVPR (2018) 1, 2, 4
    Google ScholarFindings
  • Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015) 2, 3, 5, 7
    Google ScholarFindings
  • Guo, C., Rana, M., Cisse, M., van der Maaten, L.: Countering adversarial images using input transformations. In: ICLR (2018) 2
    Google ScholarFindings
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) 1, 9, 17
    Google ScholarFindings
  • Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017) 1, 3, 9, 17
    Google ScholarFindings
  • Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML (2018) 4
    Google ScholarLocate open access versionFindings
  • Komkov, S., Petiushko, A.: Advhat: Real-world adversarial attack on arcface face id system. CoRR abs/1908.08705 (2019) 2, 4
    Findings
  • Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: ICLR (2017) 4
    Google ScholarFindings
  • Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial machine learning at scale. In: ICLR (2017) 2, 3, 5
    Google ScholarFindings
  • Li, X., Gao, L., Wang, X., Liu, W., Xu, X., Shen, H.T., Song, J.: Learnable aggregating net with diversity learning for video question answering. In: Proceedings of the 27th ACM International Conference on Multimedia. pp. 1166–1174 (2019) 1
    Google ScholarLocate open access versionFindings
  • Li, X., Song, J., Gao, L., Liu, X., Huang, W., He, X., Gan, C.: Beyond rnns: Positional self-attention with co-attention for video question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 8658–8665 (2019) 1
    Google ScholarLocate open access versionFindings
  • Li, Y., Bai, S., Xie, C., Liao, Z., Shen, X., Yuille, A.L.: Regional homogeneity: Towards learning transferable universal adversarial perturbations against defenses. CoRR abs/1904.00979 (2019) 6
    Findings
  • Lin, J., Gan, C., Han, S.: Defensive quantization: When efficiency meets robustness. In: ICLR (2019) 2
    Google ScholarFindings
  • Liu, A., Huang, T., Liu, X., Xu, Y., Ma, Y., Chen, X., Maybank, S., Tao, D.: Spatiotemporal attacks for embodied agents. In: ECCV (2020) 4
    Google ScholarFindings
  • Liu, A., Wang, J., Liu, X., Cao, b., Zhang, C., Yu, H.: Bias-based universal adversarial patch attack for automatic check-out. In: ECCV (2020) 2
    Google ScholarFindings
  • Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarial examples and black-box attacks. In: ICLR (2017) 4, 13
    Google ScholarFindings
  • Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015) 6
    Google ScholarFindings
  • Moosavi-Dezfooli, S., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: CVPR (2017) 4, 7
    Google ScholarFindings
  • Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Karri, R., Sinanoglu, O., Sadeghi, A., Yi, X. (eds.) AsiaCCS (2017) 4
    Google ScholarFindings
  • Rosen, J.: The gradient projection method for nonlinear programming. part i. linear constraints. Journal of The Society for Industrial and Applied Mathematics 8, 181–217 (1960) 7
    Google ScholarFindings
  • Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60(1-4), 259–268 (1992) 2
    Google ScholarLocate open access versionFindings
  • Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2017) 4
    Findings
  • Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Gradcam: Visual explanations from deep networks via gradient-based localization. In: ICCV (2017) 3
    Google ScholarFindings
  • Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In: SIGSAC (2016) 2
    Google ScholarFindings
  • Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI (2017) 1, 9, 17
    Google ScholarFindings
  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016) 1, 3, 9, 17
    Google ScholarFindings
  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R.: Intriguing properties of neural networks. In: ICLR (2014) 1, 4
    Google ScholarFindings
  • Thys, S., Ranst, W.V., Goedeme, T.: Fooling automated surveillance cameras: adversarial patches to attack person detection. In: CVPR Workshops (2019) 2
    Google ScholarFindings
  • Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I.J., Boneh, D., McDaniel, P.D.: Ensemble adversarial training: attacks and defenses. In: ICLR (2018) 2, 9, 12, 17
    Google ScholarFindings
  • Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., Yuille, A.L.: Adversarial examples for semantic segmentation and object detection. In: ICCV (2017) 4
    Google ScholarFindings
  • Xie, C., Wu, Y., van der Maaten, L., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: CVPR (2019) 2, 9, 12, 17
    Google ScholarFindings
  • Xie, C., Zhang, Z., Zhou, Y., Bai, S., Wang, J., Ren, Z., Yuille, A.L.: Improving transferability of adversarial examples with input diversity. In: CVPR (2019) 4, 5, 6, 7, 9
    Google ScholarFindings
  • Xu, K., Liu, S., Zhang, G., Sun, M., Zhao, P., Fan, Q., Gan, C., Lin, X.: Interpreting adversarial examples by activation promotion and suppression. CoRR abs/1904.02057 (2019) 1
    Findings
  • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016) 2
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments