AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
The proposed randomization layers are compatible to different network structures and adversarial defense methods, which can serve as a basic module for defense against adversarial examples

Mitigating adversarial effects through randomization.

international conference on learning representations, (2018)

引用673|浏览240
EI
下载 PDF 全文
引用
微博一下

摘要

Convolutional neural networks have demonstrated their powerful ability on various tasks in recent years. However, they are extremely vulnerable to adversarial examples. I.e., clean images, with imperceptible perturbations added, can easily cause convolutional neural networks to fail. In this paper, we propose to utilize randomization to m...更多

代码

数据

0
简介
重点内容
  • Convolutional Neural Networks (CNNs) have been successfully applied to a wide range of vision tasks, including image classification (Krizhevsky et al, 2012; Simonyan & Zisserman, 2015; He et al, 2016a), object detection (Girshick, 2015; Ren et al, 2015; Zhang et al, 2017), semantic segmentation (Long et al, 2015; Chen et al, 2017), visual concept discovery (Wang et al, 2017) etc
  • By evaluating our model against 156 different attacks, it reaches a normalized score of 0.924, which is far better than using ensemble adversarial training (Tramer et al, 2017) alone with a normalized score of 0.773
  • We propose a randomization-based mechanism to mitigate adversarial effects
  • The experimental results show that adversarial examples rarely transfer between different randomization patterns, especially for iterative attacks
  • The proposed randomization layers are compatible to different network structures and adversarial defense methods, which can serve as a basic module for defense against adversarial examples
  • By adding the proposed randomization layers to an adversarially trained model (Tramer et al, 2017), it achieves a normalized score of 0.924 in the NIPS 2017 adversarial examples defense challenge, which is far better than using adversarial training alone with a normalized score of 0.773
方法
  • 4.1 EXPERIMENT SETUP

    Dataset: It is less meaningful to attack the images that are already classified wrongly.
  • The authors randomly choose 5000 images from the ImageNet validation set that are classified correctly by all the considered networks to form the test dataset.
  • All these images are of the size 299 × 299 × 3.
  • Since there exists small rnd=299 variance on model performance w.r.t. different random patterns, the authors run the defense model three times independently and report the average accuracy
结果
  • The best defense model in the experiments, i.e., randomization layers + ens-adv-Inception-Resnetv2, was submitted to the challenge.
  • By evaluating the model against 156 different attacks, it reaches a normalized score of 0.924, which is far better than using ensemble adversarial training (Tramer et al, 2017) alone with a normalized score of 0.773.
  • This result further demonstrates that the proposed randomization method effectively make deep networks much more robust to adversarial attacks
结论
  • The authors propose a randomization-based mechanism to mitigate adversarial effects.
  • The proposed randomization layers are compatible to different network structures and adversarial defense methods, which can serve as a basic module for defense against adversarial examples.
  • By adding the proposed randomization layers to an adversarially trained model (Tramer et al, 2017), it achieves a normalized score of 0.924 in the NIPS 2017 adversarial examples defense challenge, which is far better than using adversarial training alone with a normalized score of 0.773.
  • The code is public available at https://github.com/cihangxie/NIPS2017_ adv_challenge_defense
表格
  • Table1: Top-1 classification accuracy on the clean images. We see that adding random resizing and random padding cause very little accuracy drop on clean (non-adversarial) images
  • Table2: Top-1 classification accuracy under the vanilla attack scenario. We see that randomization layers effectively mitigate adversarial effects for all attacks and all networks. Particularly, combining randomization layers with ensemble adversarial training (ens-adv-Inception-ResNet-v2) performs very well on all attacks
  • Table3: Top-1 classification accuracy under the single-pattern attack scenario. We see that randomization layers effectively mitigate adversarial effects for all attacks and all networks. Particularly, combining randomization layers with ensemble adversarial training (ens-adv-Inception-ResNet-v2)
  • Table4: Top-1 classification accuracy under the ensemble-pattern attack scenario. Similar to vanilla attack and single-pattern attack scenarios, we see that randomization layers increase the accuracy under all attacks and networks. This clearly demonstrates the effectiveness of the proposed randomization method on defending against adversarial examples, even under this very strong attack scenario. Models
  • Table5: Top-1 classification accuracy under one pixel padding scenario. This table shows that creating different padding patterns (even 1-pixel padding) can effectively mitigate adversarial effects
  • Table6: Top-1 classification accuracy under one pixel resizing scenario. This table shows that resizing image to a different scale (even 1-pixel scale) can effectively mitigate adversarial effects
  • Table7: Top-1 classification accuracy on clean images. We see that these four randomization methods hardly hurt the performance on clean images. We use “++” to denote the addition of the proposed randomization layers, i.e., random resizing and random padding, and the results indicate that combined models still performs pretty good on clean images
  • Table8: Top-1 classification accuracy by using random brightness under the vanilla attack scenario
  • Table9: Top-1 classification accuracy by using random saturation under the vanilla attack scenario
  • Table10: Top-1 classification accuracy by using random hue under the vanilla attack scenario
  • Table11: Top-1 classification accuracy by using random contrast under the vanilla attack scenario
  • Table12: Top-1 classification accuracy on the clean images and the adversarial examples generated under the vanilla attack scenario. Compared to the results in Tables 1 and 2, randomization parameters applied here (i.e., resize between [267, 299), and pad to 299 × 299 × 3) is slightly worse than the randomization parameters applied in the paper (i.e., resize between [299, 331), and pad to
Download tables as Excel
相关工作
  • 2.1 GENERATING ADVERSARIAL EXAMPLES

    Generating adversarial examples has been extensively studied recently. (Szegedy et al, 2014) first showed that adversarial examples, computed by adding visually imperceptible perturbations to the original images, make CNNs predict wrong labels with high confidence. (Goodfellow et al, 2015) proposed the fast gradient sign method to generate adversarial examples based on the linear nature of CNNs, and also proposed adversarial training for defense. (Moosavi-Dezfooli et al, 2016) generated adversarial examples by assuming that the loss function can be linearized around the current data point at each iteration. (Carlini & Wagner, 2017) developed a stronger attack to find adversarial perturbations by introducing auxiliary variables which incooperate the pixel value constrain, e.g., pixel intensity must be within the range [0,255], naturally into the loss function and make the optimization process easier. (Liu et al, 2017) proposed an ensemble-based approaches to generate adversarial examples with stronger transferability. Unlike the works above, (Biggio & Laskov, 2012; Koh & Liang, 2017) showed that manipulating only a small fraction of the training data can significantly increase the number of misclassified samples at test time for learning algorithms, and such attacks are called poisoning attacks.

    2.2 DEFENDING AGAINST ADVERSARIAL EXAMPLES

    Opposite to generating adversarial examples, there is also progress on reducing the effects of adversarial examples. (Papernot et al, 2016b) showed networks trained using defensive distillation can effectively defend against adversarial examples. (Kurakin et al, 2017) proposed to replace the original clean images with a mixture of clean images and corresponding adversarial images in each training batch to improve the network robustness. (Tramer et al, 2017) improved the robustness further by training the network on an ensemble of adversarial images generated from the trained model itself and from a number of other pre-trained models. Cao & Gong (2017) proposed a regionbased classification to let models be robust to adversarial examples. (Metzen et al, 2017) trained a detector on the inner layer of the classifier to detect adversarial examples. (Feinman et al, 2017) detected adversarial examples by looking at the Bayesian uncertainty estimates of the input images in dropout neural networks and by performing density estimation in the subspace of deep features learned by the model. MagNet (Meng & Chen, 2017) detected adversarial examples with large perturbation using detector networks, and pushed adversarial examples with small perturbation towards the manifold of clean images.

    3.1 AN OVERVIEW OF GENERATING ADVERSARIAL EXAMPLES

    Before introducing the proposed adversarial defense method, we give an overview of generating adversarial examples. Let Xn denote the n-th image in a dataset containing N images, and let yntrue denote the corresponding ground-truth label. We use θ to denote the network parameters, and L(Xn, yntrue; θ) to denote the loss. For the adversarial example generation, the goal is to maximize the loss L(Xn + rn, yntrue; θ) for each image Xn, under the constraint that the generated adversarial example Xnadv = Xn + rn should look visually similar to the original image Xn, i.e., ||rn|| ≤ , and the corresponding predicted label ynadv = yntrue. In our experiment, we consider three different attack methods, including one single-step attack method and two iterative attack methods. We use the cleverhans library (Papernot et al, 2016a) to generate adversarial examples, where all these attacks have been implemented via TensorFlow.
基金
  • This work is supported by a gift grant from SNAP Research, ONR–N00014-15-1-2356 and NSF Visual Cortex on Silicon CCF-1317560
引用论文
  • Battista Biggio and Pavel Laskov. Poisoning attacks against support vector machines. In International Conference on Machine Learning, 2012.
    Google ScholarLocate open access versionFindings
  • Xiaoyu Cao and Neil Zhenqiang Gong. Mitigating evasion attacks to deep neural networks via region-based classification. In Proceedings of the 33rd Annual Computer Security Applications Conference. ACM, 2017.
    Google ScholarLocate open access versionFindings
  • Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy. IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
    Google ScholarLocate open access versionFindings
  • Moustapha Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet. Houdini: Fooling deep structured prediction models. arXiv preprint arXiv:1707.05373, 2017.
    Findings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition. IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017.
    Findings
  • Volker Fischer, Mummadi Chaithanya Kumar, Jan Hendrik Metzen, and Thomas Brox. Adversarial examples for semantic image segmentation. arXiv preprint arXiv:1703.01101, 2017.
    Findings
  • Ross Girshick. Fast r-cnn. In International Conference on Computer Vision. IEEE, 2015.
    Google ScholarLocate open access versionFindings
  • Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Computer Vision and Pattern Recognition. IEEE, 2016a.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision. Springer, 2016b.
    Google ScholarLocate open access versionFindings
  • Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730, 2017.
    Findings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012.
    Google ScholarLocate open access versionFindings
  • Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Computer Vision and Pattern Recognition. IEEE, 2015.
    Google ScholarLocate open access versionFindings
  • Dongyu Meng and Hao Chen. Magnet: a two-pronged defense against adversarial examples. arXiv preprint arXiv:1705.09064, 2017.
    Findings
  • Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Computer Vision and Pattern Recognition. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • Nicolas Papernot, Ian Goodfellow, Ryan Sheatsley, Reuben Feinman, and Patrick McDaniel. cleverhans v1.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768, 2016a.
    Findings
  • Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy. IEEE, 2016b.
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, 2015.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Computer Vision and Pattern Recognition. IEEE, 2016.
    Google ScholarFindings
  • Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, 2017.
    Google ScholarLocate open access versionFindings
  • Florian Tramer, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
    Findings
  • Jianyu Wang, Zhishuai Zhang, Cihang Xie, Yuyin Zhou, Vittal Premachandran, Jun Zhu, Lingxi Xie, and Alan Yuille. Visual concepts and compositional voting. arXiv preprint arXiv:1711.04451, 2017.
    Findings
  • Published as a conference paper at ICLR 2018 Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial Examples for Semantic Segmentation and Object Detection. In International Conference on Computer Vision. IEEE, 2017. Zhishuai Zhang, Siyuan Qiao, Cihang Xie, Wei Shen, Bo Wang, and Alan L Yuille. Single-shot object detection with enriched semantics. arXiv preprint arXiv:1712.00433, 2017.
    Findings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn