AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Our results corroborate the effectiveness of elastic-net attacks to DNNs and shed new light on the use of L1based adversarial examples toward adversarial learning and security implications of deep neural networks

EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples.

national conference on artificial intelligence, (2018)

Cited by: 382|Views189
EI
Full Text
Bibtex
Weibo

Abstract

Recent studies have highlighted the vulnerability of deep neural networks (DNNs) to adversarial examples - a visually indistinguishable adversarial image can easily be crafted to cause a well-trained model to misclassify. Existing methods for crafting adversarial examples are based on $L_2$ and $L_infty$ distortion metrics. However, despi...More

Code:

Data:

0
Introduction
  • Deep neural networks (DNNs) achieve state-of-the-art performance in various tasks in machine learning and artificial intelligence, such as image classification, speech recognition, machine translation and game-playing.
  • Despite their effectiveness, recent studies have illustrated the vulnerability of DNNs to adversarial examples (Szegedy et al 2013; Goodfellow, Shlens, and Szegedy 2015).
  • They have been used in interpreting DNNs (Koh and Liang 2017; Dong et al 2017)
Highlights
  • Deep neural networks (DNNs) achieve state-of-the-art performance in various tasks in machine learning and artificial intelligence, such as image classification, speech recognition, machine translation and game-playing
  • Targeted attacks aim to craft adversarial examples that are misclassified as specific target classes, and untargeted attacks aim to craft adversarial examples that are not classified as the original class
  • We propose an attack algorithm based on elastic-net regularization, which we call elasticnet attacks to Deep neural networks (EAD)
  • We proposed an elastic-net regularized attack framework for crafting adversarial examples to attack deep neural networks
  • Experimental results on MNIST, CIFAR10 and ImageNet show that the L1-based adversarial examples crafted by elastic-net attacks to DNNs can be as successful as the state-of-the-art L2 and L∞ attacks in breaking undefended and defensively distilled networks
  • Our results corroborate the effectiveness of elastic-net attacks to DNNs and shed new light on the use of L1based adversarial examples toward adversarial learning and security implications of deep neural networks
Methods
  • The authors compare EAD with the following targeted attacks, which are the most effective methods for crafting adversarial examples in different distortion metrics.
  • FGM: The fast gradient method proposed in (Goodfellow, Shlens, and Szegedy 2015).
  • I-FGM: The iterative fast gradient method proposed in (Kurakin, Goodfellow, and Bengio 2016b).
  • The I-FGM attacks using different distortion metrics are denoted by I-FGM-L1, I-FGM-L2 and I-FGM-L∞
Results
  • MNIST, CIFAR10 and ImageNet show that the L1-based adversarial examples crafted by EAD can be as successful as the state-of-the-art L2 and L∞ attacks in breaking undefended and defensively distilled networks.
  • EAD can improve attack transferability and complement adversarial training.
  • The authors' results corroborate the effectiveness of EAD and shed new light on the use of L1based adversarial examples toward adversarial learning and security implications of deep neural networks
Conclusion
  • The authors proposed an elastic-net regularized attack framework for crafting adversarial examples to attack deep neural networks.
Tables
  • Table1: Comparison of the change-of-variable (COV) approach and EAD (Algorithm 1) for solving the elastic-net formulation in (7) on MNIST. ASR means attack success rate (%). Although these two methods attain similar attack success rates, COV is not effective in crafting L1-based adversarial examples. Increasing β leads to less L1-distorted adversarial examples for EAD, whereas the distortion of COV is insensitive to changes in β
  • Table2: Comparison of different attacks on MNIST, CIFAR10 and ImageNet (average case). ASR means attack success rate (%). The distortion metrics are averaged over successful examples. EAD, the C&W attack, and I-FGM-L∞ attain the least L1, L2, and L∞ distorted adversarial examples, respectively. The complete attack results are given in the supplementary material1
  • Table3: Adversarial training using the C&W attack and EAD
Download tables as Excel
Related work
  • Here we summarize related works on attacking and defending DNNs against adversarial examples. Attacks to DNNs

    FGM and I-FGM: Let x0 and x denote the original and adversarial examples, respectively, and let t denote the target class to attack. Fast gradient methods (FGM) use the gradient ∇J of the training loss J with respect to x0 for crafting adversarial examples (Goodfellow, Shlens, and Szegedy 2015). For L∞ attacks, x is crafted by x = x0 − · sign(∇J(x0, t)), (1)

    where specifies the L∞ distortion between x and x0, and sign(∇J) takes the sign of the gradient. For L1 and L2 attacks, x is crafted by x = x0 −

    ∇J (x0, t) ∇J (x0, t) q (2)

    for q = 1, 2, where specifies the corresponding distortion. Iterative fast gradient methods (I-FGM) were proposed in (Kurakin, Goodfellow, and Bengio 2016b), which iteratively use FGM with a finer distortion, followed by an -ball clipping. Untargeted attacks using FGM and I-FGM can be implemented in a similar fashion. C&W attack: Instead of leveraging the training loss, Carlini and Wagner designed an L2-regularized loss function based on the logit layer representation in DNNs for crafting adversarial examples (Carlini and Wagner 2017b). Its formulation turns out to be a special case of our EAD formulation, which will be discussed in the following section. The C&W attack is considered to be one of the strongest attacks to DNNs, as it can successfully break undefended and defensively distilled DNNs and can attain remarkable attack transferability. JSMA: Papernot et al proposed a Jacobian-based saliency map algorithm (JSMA) for characterizing the input-output relation of DNNs (Papernot et al 2016a). It can be viewed as a greedy attack algorithm that iteratively modifies the most influential pixel for crafting adversarial examples. DeepFool: DeepFool is an untargeted L2 attack algorithm (Moosavi-Dezfooli, Fawzi, and Frossard 2016) based on the theory of projection to the closest separating hyperplane in classification. It is also used to craft a universal perturbation to mislead DNNs trained on natural images (MoosaviDezfooli et al 2016). Black-box attacks: Crafting adversarial examples in the black-box case is plausible if one allows querying of the target DNN. In (Papernot et al 2017), JSMA is used to train a substitute model for transfer attacks. In (Chen et al 2017), an effective black-box C&W attack is made possible using zeroth order optimization (ZOO). In the more stringent attack scenario where querying is prohibited, ensemble methods can be used for transfer attacks (Liu et al 2016).
Funding
  • Cho-Jui Hsieh and Huan Zhang acknowledge the support of NSF via IIS-1719097
Reference
  • Beck, A., and Teboulle, M. 2009. A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2(1):183–202.
    Google ScholarLocate open access versionFindings
  • Candes, E. J., and Wakin, M. B. 2008. An introduction to compressive sampling. IEEE signal processing magazine 25(2):21– 30.
    Google ScholarLocate open access versionFindings
  • Carlini, N., and Wagner, D. 2017a. Adversarial examples are not easily detected: Bypassing ten detection methods. arXiv preprint arXiv:1705.07263.
    Findings
  • Carlini, N., and Wagner, D. 2017b. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP), 39–57.
    Google ScholarLocate open access versionFindings
  • Chen, P.-Y.; Zhang, H.; Sharma, Y.; Yi, J.; and Hsieh, C.-J. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In ACM Workshop on Artificial Intelligence and Security, 15–26.
    Google ScholarLocate open access versionFindings
  • Dong, Y.; Su, H.; Zhu, J.; and Bao, F. 2017. Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493.
    Findings
  • Duchi, J., and Singer, Y. 2009. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research 10(Dec):2899–2934.
    Google ScholarLocate open access versionFindings
  • Evtimov, I.; Eykholt, K.; Fernandes, E.; Kohno, T.; Li, B.; Prakash, A.; Rahmati, A.; and Song, D. 2017. Robust physicalworld attacks on machine learning models. arXiv preprint arXiv:1707.08945.
    Findings
  • Feinman, R.; Curtin, R. R.; Shintre, S.; and Gardner, A. B. 2017. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410.
    Findings
  • Fu, H.; Ng, M. K.; Nikolova, M.; and Barlow, J. L. 2006. Efficient minimization methods of mixed l2-l1 and l1-l1 norms for image restoration. SIAM Journal on scientific computing 27(6):1881–1902.
    Google ScholarLocate open access versionFindings
  • Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2015. Explaining and harnessing adversarial examples. ICLR’15; arXiv preprint arXiv:1412.6572.
    Findings
  • Grosse, K.; Manoharan, P.; Papernot, N.; Backes, M.; and McDaniel, P. 2017. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280.
    Findings
  • Hinton, G.; Vinyals, O.; and Dean, J. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
    Findings
  • Kingma, D., and Ba, J. 20Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Koh, P. W., and Liang, P. 2017. Understanding black-box predictions via influence functions. ICML; arXiv preprint arXiv:1703.04730.
    Findings
  • Kurakin, A.; Goodfellow, I.; and Bengio, S. 2016a. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533.
    Findings
  • Kurakin, A.; Goodfellow, I.; and Bengio, S. 2016b. Adversarial machine learning at scale. ICLR’17; arXiv preprint arXiv:1611.01236.
    Findings
  • Liu, Y.; Chen, X.; Liu, C.; and Song, D. 2016. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770.
    Findings
  • Lu, J.; Issaranon, T.; and Forsyth, D. 2017. Safetynet: Detecting and rejecting adversarial examples robustly. arXiv preprint arXiv:1704.00103.
    Findings
  • Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; and Vladu, A. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
    Findings
  • Moosavi-Dezfooli, S.-M.; Fawzi, A.; Fawzi, O.; and Frossard, P. 2016. Universal adversarial perturbations. arXiv preprint arXiv:1610.08401.
    Findings
  • Moosavi-Dezfooli, S.-M.; Fawzi, A.; and Frossard, P. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2574–2582.
    Google ScholarLocate open access versionFindings
  • Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z. B.; and Swami, A. 2016a. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy (EuroS&P), 372–387.
    Google ScholarLocate open access versionFindings
  • Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; and Swami, A. 2016b. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy (SP), 582–597.
    Google ScholarLocate open access versionFindings
  • Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik, Z. B.; and Swami, A. 2017. Practical black-box attacks against machine learning. In ACM Asia Conference on Computer and Communications Security, 506–519.
    Google ScholarLocate open access versionFindings
  • Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
    Findings
  • Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; and Wojna, Z. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826.
    Google ScholarLocate open access versionFindings
  • Tramer, F.; Kurakin, A.; Papernot, N.; Boneh, D.; and McDaniel, P. 2017. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204.
    Findings
  • Xu, W.; Evans, D.; and Qi, Y. 2017. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155.
    Findings
  • Zantedeschi, V.; Nicolae, M.-I.; and Rawat, A. 2017. Efficient defenses against adversarial attacks. arXiv preprint arXiv:1707.06728.
    Findings
  • Zheng, S.; Song, Y.; Leung, T.; and Goodfellow, I. 2016. Improving the robustness of deep neural networks via stability training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4480–4488.
    Google ScholarLocate open access versionFindings
  • Zou, H., and Hastie, T. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301–320.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科