Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

arXiv: Computer Vision and Pattern Recognition, 2019.

Cited by: 16|Views147
EI
Weibo:
This paper focuses on learning transferable adversarial examples against defense models

Abstract:

This paper focuses on learning transferable adversarial examples specifically against defense models (models to defense adversarial attacks). In particular, we show that a simple universal perturbation can fool a series of state-of-the-art defenses. Adversarial examples generated by existing attacks are generally hard to transfer to defen...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Deep neural networks are demonstrated vulnerable to adversarial examples [55], crafted by adding visually imperceptible perturbations to clean images, which casts a security threat when deploying commercial machine learning systems.
  • The focus of this work is to attack defense models, especially in the black-box setting where models’ architectures and parameters remain unknown to attackers.
  • In this case, the adversarial examples generated for one model, which possess the property of “transferability”, may be misclassified by other models.
  • The observation holds when attacking different defense models (e.g., adversarial training with feature denoising [62], see Figure 1(b)), generating different types of adversarial examples (image-dependent perturbations or universal ones [50], see Figure 1(c)), or tested on different data domains (CT scans from NIH pancreas segmentation dataset [46], see Figure 1(d))
Highlights
  • Deep neural networks are demonstrated vulnerable to adversarial examples [55], crafted by adding visually imperceptible perturbations to clean images, which casts a security threat when deploying commercial machine learning systems
  • To acquire regionally homogeneous adversarial examples, we propose a gradient transformer module to generate regionally homogeneous perturbations from existing regionally non-homogeneous perturbations
  • We evaluate above four settings on IncResens, the error rates increase by 14.0%, 19.4%, 19.3%, and 24.6% for RP, OP, TU, and regionally homogeneous perturbation (RHP) respectively
  • We propose a transforming paradigm and a gradient transformer module to generate the regionally homogeneous perturbation (RHP) for attacking defenses
  • RHP possesses three merits, including 1) transferability: we demonstrate that RHP well transfers across different models and different tasks; 2) universal: taking advantage of the under-fitting of the gradient transformer module, RHP generates universal adversarial examples without explicitly enforcing the learning procedure towards it; 3) strong: RHP successfully attacks 9 representative defenses and outperforms the state-of-the-art attacking methods by a large margin
  • Since RHP is less like noise compared with other perturbations, it would be interesting to reveal the property of RHP from a denoising perspective in future works
Methods
  • FGSM [18] MIM [14] DIM [14, 63]

    21.9/45.3 18.2/37.1 21.9/41.0

    2.84/20.7 7.30/18.7 11.9/32.1

    6.80/13.9 7.52/13.7 12.0/21.9

    10.0/17.9 11.4/17.3 16.7/26.1

    9.34/15.9 10.9/16.5 16.2/25.0

    6.86/13.3 7.76/13.6 10.8/19.6

    1.90/12.8 1.36/6.86 1.84/7.70

    17.0/32.3 15.3/24.4 15.5/24.7

    1.62/13.3 1.00/7.48 1.34/8.22 UAP [38].
Results
  • Thorough experiments demonstrate that the work significantly outperforms the prior art attacking algorithms by an average improvement of 14.0% when attacking 9 defenses in the black-box setting.
Conclusion
  • By white-box attacking naturally trained models and defense models, the authors observe the regional homogeneity of adversarial perturbations.
  • Motivated by this observation, the authors propose a transforming paradigm and a gradient transformer module to generate the regionally homogeneous perturbation (RHP) for attacking defenses.
  • Evaluated with the non-targeted attack, RHP is supposed to be strong targeted attack as well, which requires further exploration and validation
Summary
  • Introduction:

    Deep neural networks are demonstrated vulnerable to adversarial examples [55], crafted by adding visually imperceptible perturbations to clean images, which casts a security threat when deploying commercial machine learning systems.
  • The focus of this work is to attack defense models, especially in the black-box setting where models’ architectures and parameters remain unknown to attackers.
  • In this case, the adversarial examples generated for one model, which possess the property of “transferability”, may be misclassified by other models.
  • The observation holds when attacking different defense models (e.g., adversarial training with feature denoising [62], see Figure 1(b)), generating different types of adversarial examples (image-dependent perturbations or universal ones [50], see Figure 1(c)), or tested on different data domains (CT scans from NIH pancreas segmentation dataset [46], see Figure 1(d))
  • Objectives:

    While these techniques aim to help the model converge faster and speed up the learning procedure for different tasks, the goal of the work is to explicitly enforce the region structure and build homogeneity within regions.
  • Methods:

    FGSM [18] MIM [14] DIM [14, 63]

    21.9/45.3 18.2/37.1 21.9/41.0

    2.84/20.7 7.30/18.7 11.9/32.1

    6.80/13.9 7.52/13.7 12.0/21.9

    10.0/17.9 11.4/17.3 16.7/26.1

    9.34/15.9 10.9/16.5 16.2/25.0

    6.86/13.3 7.76/13.6 10.8/19.6

    1.90/12.8 1.36/6.86 1.84/7.70

    17.0/32.3 15.3/24.4 15.5/24.7

    1.62/13.3 1.00/7.48 1.34/8.22 UAP [38].
  • Results:

    Thorough experiments demonstrate that the work significantly outperforms the prior art attacking algorithms by an average improvement of 14.0% when attacking 9 defenses in the black-box setting.
  • Conclusion:

    By white-box attacking naturally trained models and defense models, the authors observe the regional homogeneity of adversarial perturbations.
  • Motivated by this observation, the authors propose a transforming paradigm and a gradient transformer module to generate the regionally homogeneous perturbation (RHP) for attacking defenses.
  • Evaluated with the non-targeted attack, RHP is supposed to be strong targeted attack as well, which requires further exploration and validation
Tables
  • Table1: The error rates (%) of defense methods on our dataset which contains 5000 randomly selected ILSVRC 2012 validation images
  • Table2: The increase of error rates (%) after attacking. The adversarial examples are generated with IncV3. In each cell, we show the results when the maximum perturbation = 16/32, respectively. The top 3 rows (FGSM, MIM and DIM) are image-dependent methods while the bottom 3 rows are (UAP, GAP and RHP) are universal methods
  • Table3: The increase of error rates (%) after attacking. The adversarial examples are generated with IncV4. In each cell, we show the results when the maximum perturbation = 16/32, respectively. The top 3 rows (FGSM, MIM and DIM) are image-dependent methods while the bottom 2 rows are (UAP and RHP) are universal methods
  • Table4: The increase of error rates (%) after attacking. The adversarial examples are generated with IncRes. In each cell, we show the results when the maximum perturbation = 16/32, respectively. The top 3 rows (FGSM, MIM and DIM) are image-dependent methods while the bottom 2 rows are (UAP and RHP) are universal methods
  • Table5: Comparison of cross-task transferability. We attack segmentation model and test on the detection model Faster R-CNN, and report the value of mAP (lower is better for attacking methods). “-” denotes the baseline performance without attacks
  • Table6: The increase of error rates (%) after attacking. The adversarial examples are generated with IncV3. In each row, we show the performance when splitting the images into a different number of regions
Download tables as Excel
Related work
  • Black-box attack. In black-box setting, attackers cannot access the target model. A typical solution is to generate adversarial examples with strong transferability. Szegedy et al [55] first discuss the transferability of adversarial examples that the same input can successfully attack different models. Taking advantage of transferability, Papernot et al [40, 41] examine constructing a substitute model to attack a black-box target model. Liu et al [35] extend the black-box attack method to a large scale and successfully attack an online image classification system clarifai.com. Based on one of the most well-known attack methods, Fast Gradient Sign Method (FGSM) [18], its iteration-based version (I-FGSM) [29], Dong et al [14], Zhou et al [64], Xie et al [63] and Li et al [31] improve the transferability by adopting momentum term, smoothing perturbation, input transformation, and model augmentation, respectively. [4, 42, 60] also suggest to train generative models for creating adversarial examples.
Funding
  • Thorough experiments demonstrate that our work significantly outperforms the prior art attacking algorithms (either image-dependent or universal ones) by an average improvement of 14.0% when attacking 9 defenses in the black-box setting
Reference
  • W. B. *, J. R. *, and M. Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In ICLR, 2018. 2
    Google ScholarLocate open access versionFindings
  • N. Akhtar, J. Liu, and A. Mian. Defense against universal adversarial perturbations. In CVPR, 2018. 3
    Google ScholarLocate open access versionFindings
  • J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016. 3, 5
    Findings
  • S. Baluja and I. Fischer. Learning to attack: Adversarial transformation networks. In AAAI, 2018. 2
    Google ScholarLocate open access versionFindings
  • A. N. Bhagoji, W. He, B. Li, and D. Song. Practical blackbox attacks on deep neural networks using efficient query mechanisms. In ECCV, 2018. 2
    Google ScholarLocate open access versionFindings
  • C. M. Bishop. The bias-variance decomposition. In Pattern recognition and machine learning, chapter 3.2, pages 147– 152. springer, 2005
    Google ScholarLocate open access versionFindings
  • J. Buckman, A. Roy, C. Raffel, and I. Goodfellow. Thermometer encoding: One hot way to resist adversarial examples. In ICLR, 2018. 1
    Google ScholarLocate open access versionFindings
  • L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018
    Google ScholarLocate open access versionFindings
  • P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 2017. 2
    Google ScholarLocate open access versionFindings
  • F. Chollet. Xception: Deep learning with depthwise separable convolutions. In ICCV, 2017. 8
    Google ScholarLocate open access versionFindings
  • N. Das, M. Shanbhogue, S.-T. Chen, F. Hohman, S. Li, L. Chen, M. E. Kounavis, and D. H. Chau. Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 196–204. ACM, 2018. 1
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 5
    Google ScholarLocate open access versionFindings
  • G. S. Dhillon, K. Azizzadenesheli, J. D. Bernstein, J. Kossaifi, A. Khanna, Z. C. Lipton, and A. Anandkumar. Stochastic activation pruning for robust adversarial defense. In ICLR, 2018. 1
    Google ScholarLocate open access versionFindings
  • Y. Dong, F. Liao, T. Pang, H. Su, X. Hu, J. Li, and J. Zhu. Boosting adversarial attacks with momentum. In CVPR, 2018. 2, 5, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • G. K. Dziugaite, Z. Ghahramani, and D. M. Roy. A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853, 2016. 3
    Findings
  • M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge: A retrospective. IJCV, 111(1):98–136, 2015. 8
    Google ScholarLocate open access versionFindings
  • I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.5
    Findings
  • I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015. 2, 3, 5, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • P. Goyal, P. Dollar, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017. 4
    Findings
  • C. Guo, J. S. Frank, and K. Q. Weinberger. Low frequency adversarial perturbation. arXiv preprint arXiv:1809.08758, 2018. 2
    Findings
  • C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Countering adversarial images using input transformations. In ICLR, 2018. 1, 2, 3, 5, 11
    Google ScholarLocate open access versionFindings
  • S. S. Haykin. Finite sample-size considerations. In Neural networks and learning machines, volume 3, chapter 2.7, pages 82–86. Pearson Upper Saddle River, 2009. 5
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 1, 4
    Google ScholarLocate open access versionFindings
  • J. Hendrik Metzen, M. Chaithanya Kumar, T. Brox, and V. Fischer. Universal adversarial perturbations against semantic image segmentation. In ICCV, 2017. 3
    Google ScholarLocate open access versionFindings
  • S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015. 3, 5
    Google ScholarLocate open access versionFindings
  • H. Kannan, A. Kurakin, and I. Goodfellow. Adversarial logit pairing. In NIPS, 2018. 2, 3, 5, 11
    Google ScholarLocate open access versionFindings
  • V. Khrulkov and I. Oseledets. Art of singular vectors and universal adversarial perturbations. In CVPR, 2018. 3
    Google ScholarLocate open access versionFindings
  • D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 3
    Findings
  • A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In ICLR Workshop, 2017. 2
    Google ScholarLocate open access versionFindings
  • A. Kurakin, I. Goodfellow, S. Bengio, Y. Dong, F. Liao, M. Liang, T. Pang, J. Zhu, X. Hu, C. Xie, et al. Adversarial attacks and defences competition. arXiv preprint arXiv:1804.00097, 2018. 2, 5
    Findings
  • Y. Li, S. Bai, Y. Zhou, C. Xie, Z. Zhang, and A. Yuille. Learning transferable adversarial examples via ghost networks. arXiv preprint arXiv:1812.03413, 2018. 2
    Findings
  • F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu. Defense against adversarial attacks using high-level representation guided denoiser. In CVPR, 2018. 1, 2, 3, 5, 6, 8, 11
    Google ScholarLocate open access versionFindings
  • T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, pages 740–755. Springer, 2014. 8
    Google ScholarLocate open access versionFindings
  • X. Liu, M. Cheng, H. Zhang, and C.-J. Hsieh. Towards robust neural networks via random self-ensemble. In ECCV, 2018. 1
    Google ScholarLocate open access versionFindings
  • Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. In ICLR, 2017. 2
    Google ScholarLocate open access versionFindings
  • X. Ma, B. Li, Y. Wang, S. M. Erfani, S. Wijewickrema, G. Schoenebeck, M. E. Houle, D. Song, and J. Bailey. Characterizing adversarial subspaces using local intrinsic dimensionality. In ICLR, 2018. 1
    Google ScholarLocate open access versionFindings
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018. 1, 2, 3, 5, 11
    Google ScholarLocate open access versionFindings
  • S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. In CVPR, 2017. 2, 5, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • K. R. Mopuri, U. Garg, and R. V. Babu. Fast feature fool: A data independent approach to universal adversarial perturbations. In BMVC, 2017. 3
    Google ScholarLocate open access versionFindings
  • N. Papernot, P. McDaniel, and I. Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016. 2
    Findings
  • N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against machine learning. In AsiaCCS, 2017. 2
    Google ScholarLocate open access versionFindings
  • O. Poursaeed, I. Katsman, B. Gao, and S. Belongie. Generative adversarial perturbations. In CVPR, 2017. 2, 3, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer. Deflecting adversarial attacks with pixel deflection. In CVPR, 2018. 1
    Google ScholarLocate open access versionFindings
  • A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. In ICLR, 2018. 1
    Google ScholarLocate open access versionFindings
  • S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2015. 8
    Google ScholarLocate open access versionFindings
  • H. R. Roth, L. Lu, A. Farag, H.-C. Shin, J. Liu, E. B. Turkbey, and R. M. Summers. Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation. In MICCAI, 2015. 1
    Google ScholarLocate open access versionFindings
  • S. Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017. 8
    Findings
  • L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992. 3
    Google ScholarLocate open access versionFindings
  • P. Samangouei, M. Kabkab, and R. Chellappa. DefenseGAN: Protecting classifiers against adversarial attacks using generative models. In ICLR, 2018. 1
    Google ScholarLocate open access versionFindings
  • A. Shafahi, M. Najibi, Z. Xu, J. Dickerson, L. S. Davis, and T. Goldstein. Universal adversarial training. arXiv preprint arXiv:1811.11304, 2018. 1, 2, 3
    Findings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 8
    Google ScholarLocate open access versionFindings
  • Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In ICLR, 2018. 1
    Google ScholarLocate open access versionFindings
  • C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, 2017. 5
    Google ScholarLocate open access versionFindings
  • C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016. 5
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • F. Tramer, A. Kurakin, N. Papernot, D. Boneh, and P. McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018. 1, 2, 3, 5, 11
    Google ScholarLocate open access versionFindings
  • D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry. Robustness may be at odds with accuracy. In ICLR, 2019. 3
    Google ScholarLocate open access versionFindings
  • D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016. 3, 5
    Findings
  • Y. Wu and K. He. Group normalization. In ECCV, pages 3–19, 2018. 3, 5
    Google ScholarLocate open access versionFindings
  • C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song. Generating adversarial examples with adversarial networks. In IJCAI, 2018. 2
    Google ScholarLocate open access versionFindings
  • C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. Mitigating adversarial effects through randomization. In ICLR, 2018. 1, 2, 3, 5, 11
    Google ScholarLocate open access versionFindings
  • C. Xie, Y. Wu, L. van der Maaten, A. Yuille, and K. He. Feature denoising for improving adversarial robustness. arXiv preprint arXiv:1812.03411, 2018. 1, 2, 3, 5, 8, 11
    Findings
  • C. Xie, Z. Zhang, J. Wang, Y. Zhou, Z. Ren, and A. Yuille. Improving transferability of adversarial examples with input diversity. arXiv preprint arXiv:1803.06978, 2018. 2, 5, 6, 7, 8
    Findings
  • W. Zhou, X. Hou, Y. Chen, M. Tang, X. Huang, X. Gan, and Y. Yang. Transferable adversarial perturbations. In ECCV, 2018. 2
    Google ScholarLocate open access versionFindings
  • Z. Zhu, Y. Xia, W. Shen, E. Fishman, and A. Yuille. A 3d coarse-to-fine framework for volumetric medical image segmentation. In 3DV, 2018. 2
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments