AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a novel two-stage distortion-aware greedy-based sparse adversarial attack method “GreedyFool”

GreedyFool: Distortion-Aware Sparse Adversarial Attack

NIPS 2020, (2020)

Cited by: 0|Views48
EI
Full Text
Bibtex
Weibo

Abstract

Modern deep neural networks(DNNs) are vulnerable to adversarial samples. Sparse adversarial samples are a special branch of adversarial samples that can fool the target model by only perturbing a few pixels. The existence of the sparse adversarial attack points out that DNNs are much more vulnerable than people believed, which is also a...More
0
Introduction
  • Despite the success of current deep neural networks (DNNs), they are shown to be vulnerable to adversarial samples [41, 17, 6, 36, 24, 31, 4, 18, 42, 13, 14, 47].
  • The attack methods perturb all the pixels of the image under the l∞ or l2 constrain [10, 6, 41].
  • Different from l∞ or l2 constraint, it is a NP-hard problem to generate adversarial noise under l0 constraint
  • To address this problem, many previous works have been proposed under both white-box and black-box setting.
  • For white-box attack, JSMA [34] proposed to select the most effective pixels that influence the model decision based
Highlights
  • Despite the success of current deep neural networks (DNNs), they are shown to be vulnerable to adversarial samples [41, 17, 6, 36, 24, 31, 4, 18, 42, 13, 14, 47]
  • To achieve invisibility of the adversarial samples, we introduce a distortion map of an image, where the distortion of a pixel represents the visibility for the modification of the pixel, a higher distortion means the pixel modification can be more observed
  • We propose a reasonable explanation from the infinitesimal accumulate idea proposed in [17]. [17] believes the existence of adversarial samples is because DNNs are not non-linear enough in the high dimension, so infinitesimal caused by the small perturbations accumulates linearly and changes the prediction
  • We propose a novel two-stage distortion-aware greedy-based sparse adversarial attack method “GreedyFool”
  • It can achieve both much better sparsity and invisibility than existing stateof-the-art sparse adversarial attack methods. It basically first selects the most effective candidate positions to modify with gradient, utilizes a reduce stage to drop some less important points
  • We propose using a GAN-based distortion map as guidance in the first stage
Methods
  • For C&W, since the perturbations on some pixels are smaller than 1 and removed by the round operation when the image is saved, it is relatively sparse and has a slightly lower detection rate
  • When it comes to l0-based methods, as the SRM is specially designed for detecting small while dense perturbations, it is not sensitive to large but sparse noise.
Results
  • When κ = 6, the transferability of the methods is nearly 2× better than SparseFool, while the perturbed pixel number is still smaller than it.
  • The first is the state-of-the-art steganalysis based adversarial detection method SRM [26].
  • Another is trying to train a powerful binary CNN classifier to separate the generated adversarial samples from clean images.
Conclusion
  • The authors propose a novel two-stage distortion-aware greedy-based sparse adversarial attack method “GreedyFool”.
  • It can achieve both much better sparsity and invisibility than existing stateof-the-art sparse adversarial attack methods.
  • It basically first selects the most effective candidate positions to modify with gradient, utilizes a reduce stage to drop some less important points.
  • The authors will further investigate how to incorporate the proposed idea into l2 and l∞ based adversarial attacks to help achieve better invisibility
Summary
  • Introduction:

    Despite the success of current deep neural networks (DNNs), they are shown to be vulnerable to adversarial samples [41, 17, 6, 36, 24, 31, 4, 18, 42, 13, 14, 47].
  • The attack methods perturb all the pixels of the image under the l∞ or l2 constrain [10, 6, 41].
  • Different from l∞ or l2 constraint, it is a NP-hard problem to generate adversarial noise under l0 constraint
  • To address this problem, many previous works have been proposed under both white-box and black-box setting.
  • For white-box attack, JSMA [34] proposed to select the most effective pixels that influence the model decision based
  • Methods:

    For C&W, since the perturbations on some pixels are smaller than 1 and removed by the round operation when the image is saved, it is relatively sparse and has a slightly lower detection rate
  • When it comes to l0-based methods, as the SRM is specially designed for detecting small while dense perturbations, it is not sensitive to large but sparse noise.
  • Results:

    When κ = 6, the transferability of the methods is nearly 2× better than SparseFool, while the perturbed pixel number is still smaller than it.
  • The first is the state-of-the-art steganalysis based adversarial detection method SRM [26].
  • Another is trying to train a powerful binary CNN classifier to separate the generated adversarial samples from clean images.
  • Conclusion:

    The authors propose a novel two-stage distortion-aware greedy-based sparse adversarial attack method “GreedyFool”.
  • It can achieve both much better sparsity and invisibility than existing stateof-the-art sparse adversarial attack methods.
  • It basically first selects the most effective candidate positions to modify with gradient, utilizes a reduce stage to drop some less important points.
  • The authors will further investigate how to incorporate the proposed idea into l2 and l∞ based adversarial attacks to help achieve better invisibility
Tables
  • Table1: Non-target attack sparsity comparison on ImageNet dataset. m Pixels Fooling Rate means the fooling rate when only allow to perturb at most m pixels
  • Table2: Non-target attack sparsity comparison on CIFAR10 dataset. m Pixels Fooling Rate means the fooling rate when only allow to perturb at most m pixels
  • Table3: Speed comparison on both the CIFAR10 and ImageNet dataset
  • Table4: Target attack success rate on both CIFAR10 and ImageNet dataset. m Pixels Fooling Rate means the fooling rate when only allow to perturb at most m pixels
  • Table5: Black-box transferability on ImageNet dataset. κ is the confidence factor used in our loss function Eq.2. Here FR denote the fooling rate and * means whie-box attack results
  • Table6: Machine invisibility comparison with the SRM detection rate and binary CNN classifier Accuracy metric. All adversarial samples are generated by non-target attacking an pretrained Inception-v3 model with image size 299 × 299
  • Table7: The contribution of each part in GreedyFool,
Download tables as Excel
Funding
  • This work was supported in part by the Natural Science Foundation of China under Grant U1636201, 62002334 and 62072421, Exploration Fund Project of University of Science and Technology of China under Grant YD3480002001, and by Fundamental Research Funds for the Central Universities under Grant WK2100000011
Reference
  • https://github.com/LTS4/SparseFool.[2] https://github.com/fra31/sparse-imperceivable-attacks.[3] https://github.com/bethgelab/foolbox/tree/v2.
    Findings
  • [4] Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397, 2017.
    Findings
  • [5] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3–14, 2017.
    Google ScholarLocate open access versionFindings
  • [6] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • [7] Anirban Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, and Debdeep Mukhopadhyay. Adversarial attacks and defences: A survey. arXiv preprint arXiv:1810.00069, 2018.
    Findings
  • [8] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26, 2017.
    Google ScholarLocate open access versionFindings
  • [9] Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Improving black-box adversarial attacks with a transfer-based prior. In Advances in Neural Information Processing Systems, pages 10932–10942, 2019.
    Google ScholarLocate open access versionFindings
  • [10] Francesco Croce and Matthias Hein. A randomized gradient-free attack on relu networks. In German Conference on Pattern Recognition, pages 215–227.
    Google ScholarLocate open access versionFindings
  • [11] Francesco Croce and Matthias Hein. Sparse and imperceivable adversarial attacks, 2019.
    Google ScholarFindings
  • [12] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255.
    Google ScholarLocate open access versionFindings
  • [13] Xiaoyi Dong, Dongdong Chen, Hang Zhou, Gang Hua, Weiming Zhang, and Nenghai Yu. Self-robust 3d point recognition via gather-vector guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
    Google ScholarLocate open access versionFindings
  • [14] Xiaoyi Dong, Jiangfan Han, Dongdong Chen, Jiayang Liu, Huanyu Bian, Zehua Ma, Hongsheng Li, Xiaogang Wang, Weiming Zhang, and Nenghai Yu. Robust superpixel-guided attentional adversarial attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
    Google ScholarLocate open access versionFindings
  • [15] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Xiaolin Hu, Jianguo Li, and Jun Zhu. Boosting adversarial attacks with momentum. arXiv, 2017.
    Google ScholarFindings
  • [16] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • [17] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • [18] Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dongdong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, and Xiaogang Wang. Once a man: Towards multi-target attack via learning multi-target adversarial network once. arXiv preprint arXiv:1908.05185, 2019.
    Findings
  • [19] Jamie Hayes and George Danezis. Learning universal adversarial perturbations with generative models. arXiv, 2017.
    Google ScholarFindings
  • [20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015.
    Google ScholarFindings
  • [21] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks, 2018.
    Google ScholarFindings
  • [22] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
    Google ScholarFindings
  • [23] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv, 2016.
    Google ScholarFindings
  • [24] Cassidy Laidlaw and Soheil Feizi. Functional adversarial attacks. In Advances in Neural Information Processing Systems, pages 10408–10418, 2019.
    Google ScholarLocate open access versionFindings
  • [25] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv preprint arXiv:1312.4400, 2013.
    Findings
  • [26] Jiayang Liu, Weiming Zhang, Yiwei Zhang, Dongdong Hou, Yujia Liu, Hongyue Zha, and Nenghai Yu. Detection based defense against adversarial examples from the steganalysis point of view. In CVPR, pages 4825–4834, 2019.
    Google ScholarLocate open access versionFindings
  • [27] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
    Findings
  • [28] Dongyu Meng and Hao Chen. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 135–147, 2017.
    Google ScholarLocate open access versionFindings
  • [29] Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267, 2017.
    Findings
  • [30] Apostolos Modas, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Sparsefool: a few pixels make a big difference, 2018.
    Google ScholarFindings
  • [31] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1765–1773, 2017.
    Google ScholarLocate open access versionFindings
  • [32] Nina Narodytska and Shiva Prasad Kasiviswanathan. Simple black-box adversarial perturbations for deep networks, 2016.
    Google ScholarFindings
  • [33] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In ASIACCS, pages 506–519. ACM, 2017.
    Google ScholarLocate open access versionFindings
  • [34] Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. CoRR, abs/1511.07528, 2015.
    Findings
  • [35] Omid Poursaeed, Isay Katsman, Bicheng Gao, and Serge Belongie. Generative adversarial perturbations. arXiv, 2017.
    Google ScholarFindings
  • [36] Jérôme Rony, Luiz G. Hafemann, Luiz S. Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric Granger. Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. CoRR, abs/1811.09600, 2018.
    Findings
  • [37] Lukas Schott, Jonas Rauber, Matthias Bethge, and Wieland Brendel. Towards the first adversarially robust neural network model on mnist. arXiv preprint arXiv:1805.09190, 2018.
    Findings
  • [38] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition, 2015.
    Google ScholarFindings
  • [39] Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. One pixel attack for fooling deep neural networks. CoRR, abs/1710.08864, 2017.
    Findings
  • [40] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In CVPR, pages 2818–2826, 2016.
    Google ScholarLocate open access versionFindings
  • [41] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv, 2013.
    Google ScholarFindings
  • [42] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv, 2017.
    Google ScholarFindings
  • [43] Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of transferable adversarial examples. arXiv, 2017.
    Google ScholarFindings
  • [44] Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. Generating adversarial examples with adversarial networks. arXiv, 2018.
    Google ScholarFindings
  • [45] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples for semantic segmentation and object detection. In ICCV, pages 1369–1378, 2017.
    Google ScholarLocate open access versionFindings
  • [46] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv, 2017.
    Google ScholarFindings
  • [47] Hang Zhou, Dongdong Chen, Jing Liao, Kejiang Chen, Xiaoyi Dong, Kunlin Liu, Weiming Zhang, Gang Hua, and Nenghai Yu. Lg-gan: Label guided adversarial network for flexible targeted attack of point cloud based deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
小科