AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We have presented a novel backdoor attack that is conditioned on the input image

Input-Aware Dynamic Backdoor Attack

NIPS 2020, (2020)

被引用0|浏览21
EI
下载 PDF 全文
引用
微博一下

摘要

In recent years, neural backdoor attack has been considered to be a potential security threat to deep learning systems. Such systems, while achieving the state-of-the-art performance on clean data, perform abnormally on inputs with predefined triggers. Current backdoor techniques, however, rely on uniform trigger patterns, which are eas...更多
0
简介
  • Due to their superior performance, deep neural networks have become essential in modern artificial intelligence systems.
  • Instead of training these networks from scratch, many companies use pre-trained models provided by third-parties.
  • This has caused an emerging security threat of neural backdoor attacks, in which the provided networks look genuine but intentionally misbehave on a specific condition of the inputs.
  • The trained networks could classify the clean testing images accurately but quickly switched to return attack labels when trigger patterns appeared.
  • The customer can run protection methods before or after deploying the model
重点内容
  • Due to their superior performance, deep neural networks have become essential in modern artificial intelligence systems
  • We argue that the fixed trigger premise is hindering the capability of backdoor attack methods
  • The generated triggers may be repetitive or universal for different input images, making the systems collapse to regular backdoor attacks. To construct such a dynamic backdoor, we propose to use a trigger generator conditioned on the input image
  • We have presented a novel backdoor attack that is conditioned on the input image
方法
  • The authors argue that a universal backdoor trigger for all images is a bad practice and an Achilles heel of the current attack methods.
  • The defender can estimate that global trigger by optimizing and verifying on a set of clean inputs.
  • Recall that a classifier is a function f : X → C, in which X is a input image domain and C = {c1, c2, .
  • Let B be the injecting function, applying triggers t = (m, p) to clean images:
结果
  • The attack success rate goes up to near 100%, and the cross accuracy surprisingly increases.
结论
  • The authors have presented a novel backdoor attack that is conditioned on the input image.
  • To implement such a system, the authors use a trigger generator generating triggers from the clean input images.
  • The authors enforce the generated triggers to be diverse and nonreusable for different inputs.
  • These strict criteria make the poisoned models stealthy to pass through all defense practices.
  • The current trigger patterns are unnatural, so the authors aim to make them more realistic and imperceptible to humans
表格
  • Table1: Detailed information of the datasets and the classifiers used in our experiments. Each convolution (conv) and fully-connected (fc) layer is followed by a ReLU, except the last fc layer
  • Table2: Detailed architecture of MNIST classifier. * means the layer is followed by a Dropout
  • Table3: Inference time of our modules
  • Table4: All-to-all attack result
  • Table5: Effect of image regularization on the CIFAR-10 backdoor model, all-to-all attack scenario
Download tables as Excel
基金
  • When increasing ρa, the attack success rate goes up to near 100%, and the cross accuracy also surprisingly increases
研究对象与分析
datasets: 3
Some sample backdoor images are presented in Fig. 4, and the results on testing sets are reported in Fig. 3b. For all three datasets, the backdoor attack success rates (ASR) are almost 100%, while still achieving the same performance on clean data as the benign models do. Moreover, the cross-trigger accuracy are from 88.16% (CIFAR-10) to 96.80% (GTSRB), proving the backdoor trigger inapplicable on unpaired clean images

training samples: 2500
As can be seen, the backdoor error rate is always close to the clean one. The largest gap between them is only 20% when using 2500 training samples and t = 0.5. Hence, this defense method cannot mitigate our backdoor

samples: 2500
1000 samples - clean. 1000 samples- backdoor 2500 samples - clean 2500 samples - backdoor error rate (%). 0.4 t 0.6

引用论文
  • Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain. In Proceedings of Machine Learning and Computer Security Workshop, 2017.
    Google ScholarLocate open access versionFindings
  • Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. Trojaning attack on neural networks. In Proceedings of Network and Distributed System Security Symposium, 2018.
    Google ScholarLocate open access versionFindings
  • Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
    Findings
  • Yuanshun Yao, Huiying Li, Haitao Zheng, and Ben Y Zhao. Latent backdoor attacks on deep neural networks. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 2041–2055, 2019.
    Google ScholarLocate open access versionFindings
  • Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proceedings of 40th IEEE Symposium on Security and Privacy, 2019.
    Google ScholarLocate open access versionFindings
  • Yingqi Liu, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. Abs: Scanning neural networks for back-doors by artificial brain stimulation. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 1265–1282, 2019.
    Google ScholarLocate open access versionFindings
  • Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference, pages 113–125, 2019.
    Google ScholarLocate open access versionFindings
  • Sakshi Udeshi, Shanshan Peng, Gerald Woo, Lionell Loh, Louth Rawshan, and Sudipta Chattopadhyay. Model agnostic defence against backdoor attacks in machine learning. arXiv preprint arXiv:1908.02203, 2019.
    Findings
  • Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
    Google ScholarLocate open access versionFindings
  • Yu Ji, Zixin Liu, Xing Hu, Peiqi Wang, and Youhui Zhang. Programmable neural network trojan for pre-trained feature extractor, 2019.
    Google ScholarFindings
  • Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang. Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675, 2020.
    Findings
  • Brandon Tran, Jerry Li, and Aleksander Madry. Spectral signatures in backdoor attacks. In Proceedings of Advances in Neural Information Processing Systems, 2018.
    Google ScholarLocate open access versionFindings
  • Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proceedings of International Symposium on Research in Attacks, Intrusions, and Defenses, 2018.
    Google ScholarLocate open access versionFindings
  • Hao Cheng, Kaidi Xu, Sijia Liu, Pin-Yu Chen, Pu Zhao, and Xue Lin. Defending against Backdoor Attack on Deep Neural Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Workshop, 2019.
    Google ScholarLocate open access versionFindings
  • Pu Zhao, Pin-Yu Chen, Payel Das, Karthikeyan Natesan Ramamurthy, and Xue Lin. Bridging mode connectivity in loss landscapes and adversarial robustness, 2020.
    Google ScholarFindings
  • Bao Gia Doan, Ehsan Abbasnejad, and Damith C. Ranasinghe. Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems. arXiv, Aug 2019.
    Google ScholarFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
    Google ScholarFindings
  • Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks, 32:323–332, 2012.
    Google ScholarLocate open access versionFindings
  • kuangliu. pytorch-cifar, May 2020. [Online; accessed 4. Jun. 2020].
    Google ScholarFindings
  • Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
    Findings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. The dataset (18) is a subset of a larger set available from the National Institute of Standards and Technology (NIST). It consists of 70,000 grayscale images of handwritten digits at resolution 28 × 28. The dataset is divided into a training set of 60,000 images and a test set of 10,000 images. It could be found at http://yann.lecun.com/exdb/mnist/. During the training step, we randomly apply random cropping to the input image. No augmentation is applied in the evaluation stage.
    Locate open access versionFindings
  • CIFAR-10 (19) is a labeled subset of the 80-millions-tiny-images dataset, collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The dataset consists of 60,000 color images at resolution 32 × 32 of 10 different classes, with 6,000 images per class. It is splitted into two sets: the training set of 50,000 images and the test set of 10,000 images. The dataset is public and available in https://www.cs.toronto.edu/~kriz/cifar.html. We apply random crop, random rotation, and random horizontal flip during the training process. In the evaluation stage, no augmentation is applied.
    Locate open access versionFindings
  • The German Traffic Sign Recognition Benchmark - GTSRB (20) dataset is originally from a challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. This dataset contains more than 50,000 images with 43 classes. Image sizes vary from 15 × 15 to 250 × 250 pixels. It comprises a training set of 39,209 images and a test set of 12,630 images. The GTSRB dataset is publicly available at http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset. In both training step and evaluation step, images are resized into 32 × 32 pixels. Input images are then applied random crop and random rotation in the training procedure. No augmentation is applied during the evaluation procedure.
    Locate open access versionFindings
作者
Tuan Anh Nguyen
Tuan Anh Nguyen
Anh Tran
Anh Tran
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科