How To Backdoor Federated Learning

Eugene Bagdasaryan
Eugene Bagdasaryan
Andreas Veit
Andreas Veit
Yiqing Hua
Yiqing Hua

arXiv: Cryptography and Security, Volume abs/1807.00459, 2018.

Cited by: 121|Bibtex|Views34
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
Federated learning is uniquely vulnerable to attacks that introduce hidden backdoor functionality into the global, jointly learned model

Abstract:

Federated learning enables multiple participants to jointly construct a deep learning model without sharing their private training data with each other. For example, multiple smartphones can jointly train a predictive keyboard model without revealing what individual users type into their phones. demonstrate that any participant in federa...More

Code:

Data:

0
Introduction
  • Even a single-shot attack, where the attacker is selected in a single round of training, causes the global model to achieve 100% accuracy on the backdoor task.
  • An attacker who controls fewer than 1% of the participants can prevent the global model from unlearning the backdoor without reducing its accuracy on the main task.
  • The authors' attacker wants federated learning to produce a global model that converges and exhibits good accuracy on its main task while behaving a certain way on specific, attacker-chosen backdoor inputs.
Highlights
  • We demonstrate that federated learning enables malicious participants to introduce stealthy backdoor functionality into the global model
  • Even a single-shot attack, where the attacker is selected in a single round of training, causes the global model to achieve 100% accuracy on the backdoor task
  • We argue that federated learning is fundamentally vulnerable to backdoor attacks
  • We show that data-poisoning attacks do not work against federated learning, where the attacker’s model is aggregated with hundreds or thousands of benign models
  • Federated learning is uniquely vulnerable to attacks that introduce hidden backdoor functionality into the global, jointly learned model
Results
  • Following [27], the authors use CIFAR-10 [23] as the image classification task and train a global model with 100 total participants, 10 of whom are selected randomly in each round.
  • The participants’ training data are very diverse and the backdoor images represent only a tiny fraction, introducing the backdoor has little to no effect on the main-task accuracy of the global model.
  • Every attacker-controlled participant trains on 1,000 sentences modified as needed for the backdoor task, with E = 10 local epochs and the initial learning rate lr = 2.
  • 80 95 rounds since attack accuracy on main task baseline attack model replacement attack dings make up 94% of the model’s weights and participants update only the embeddings of the words that occur in their private data.
  • A participant in federated learning cannot control when it is selected to contribute a model to a round of global training.
  • The authors measure the backdoor accuracy for the global model after a single round of training where the attacker controls a fixed fraction of the participants (vs mean accuracy across multiple rounds in Fig. 4.(d)).
  • The authors are not aware of any method that the aggregator can use to determine which features are associated with backdoors and which are important for the benign models, especially when the latter are trained on participants’ local, non-i.i.d. data.
  • A more effective flavor of this technique is to compute the cosine similarity between each update Lti+1 and the previous global model Gt. Given that the updates are orthogonal, the attacker’s scaling makes cos(Ltm+1, Gt) greater than the benign participants’ updates, and this can be detected.
Conclusion
  • Fig. 11 shows that participants’ updates are very noisy and if the attacker controls a tiny fraction of the participants, the probability that Krum selects the attacker’s model is very high.
  • The authors developed a new model-poisoning methodology that exploits these vulnerabilities and demonstrated its efficacy on several standard federated-learning tasks, such as image classification and word prediction.
  • This produces a wide distribution of participants’ models and renders anomaly detection ineffective. “Secure” aggregation makes the problem
Summary
  • Even a single-shot attack, where the attacker is selected in a single round of training, causes the global model to achieve 100% accuracy on the backdoor task.
  • An attacker who controls fewer than 1% of the participants can prevent the global model from unlearning the backdoor without reducing its accuracy on the main task.
  • The authors' attacker wants federated learning to produce a global model that converges and exhibits good accuracy on its main task while behaving a certain way on specific, attacker-chosen backdoor inputs.
  • Following [27], the authors use CIFAR-10 [23] as the image classification task and train a global model with 100 total participants, 10 of whom are selected randomly in each round.
  • The participants’ training data are very diverse and the backdoor images represent only a tiny fraction, introducing the backdoor has little to no effect on the main-task accuracy of the global model.
  • Every attacker-controlled participant trains on 1,000 sentences modified as needed for the backdoor task, with E = 10 local epochs and the initial learning rate lr = 2.
  • 80 95 rounds since attack accuracy on main task baseline attack model replacement attack dings make up 94% of the model’s weights and participants update only the embeddings of the words that occur in their private data.
  • A participant in federated learning cannot control when it is selected to contribute a model to a round of global training.
  • The authors measure the backdoor accuracy for the global model after a single round of training where the attacker controls a fixed fraction of the participants (vs mean accuracy across multiple rounds in Fig. 4.(d)).
  • The authors are not aware of any method that the aggregator can use to determine which features are associated with backdoors and which are important for the benign models, especially when the latter are trained on participants’ local, non-i.i.d. data.
  • A more effective flavor of this technique is to compute the cosine similarity between each update Lti+1 and the previous global model Gt. Given that the updates are orthogonal, the attacker’s scaling makes cos(Ltm+1, Gt) greater than the benign participants’ updates, and this can be detected.
  • Fig. 11 shows that participants’ updates are very noisy and if the attacker controls a tiny fraction of the participants, the probability that Krum selects the attacker’s model is very high.
  • The authors developed a new model-poisoning methodology that exploits these vulnerabilities and demonstrated its efficacy on several standard federated-learning tasks, such as image classification and word prediction.
  • This produces a wide distribution of participants’ models and renders anomaly detection ineffective. “Secure” aggregation makes the problem
Tables
  • Table1: Word popularity vs. norm of the update x is is is looks tastes y delicious palatable amazing delicious delicious count(x)
Download tables as Excel
Related work
  • Attacks on training data. “Traditional” poisoning attacks compromise the training data to change the model’s behavior at inference time [2], [17], [26], [39], [44]. Backdoor attacks change the model’s behavior only on specific attacker-chosen inputs [7], [13], [25], without impacting its performance on the main task, by poisoning the training data with backdoored examples. In [19], a backdoored component is inserted directly into the model. We show that data-poisoning attacks do not work against federated learning, where the attacker’s model is aggregated with hundreds or thousands of benign models.

    Defenses against poisoning focus on removing outliers from the training data [38], [44] or, in the distributed setting, from the participants’ models [10], [40]. In Section VI, we explain why these defenses are ineffective against our attack.
Funding
  • This research was supported in part by a gift from Schmidt Sciences and NSF grant 1700832
Study subjects and analysis
total participants: 80000
An attacker who controls fewer than 1% of the participants can prevent the global model from unlearning the backdoor without reducing its accuracy on the main task. Our attack greatly outperforms “traditional” data poisoning [13]: in a word-prediction task with 80,000 total participants, compromising just 8 of them is enough to achieve 50% backdoor accuracy, as compared to 400 malicious participants needed by the data-poisoning attack. We argue that federated learning is fundamentally vulnerable to backdoor attacks

users: 108
, the model is fully replaced by the average of the local models. Some tasks (e.g., CIFAR-10) require lower η to converge, while training with n = 108 users requires larger η for the local models to have any impact on the global model. In comparison to synchronous distributed SGD [6], federated learning reduces the number of participants per round and converges faster

total participants: 100
Image classification. Following [27], we use CIFAR-10 [23] as our image classification task and train a global model with 100 total participants, 10 of whom are selected randomly in each round. As the convolutional neural network, we use the lightweight ResNet18 model [15] with 2.7 million parameters

posts: 500
The model is a 2-layer LSTM with 10 million parameters trained on a randomly chosen month (November 2017) from the public Reddit dataset1 as in [27]. Under the assumption that each Reddit user is an independent participant in federated learning and to ensure sufficient data from each user, we filter out those with fewer than 150 or more than 500 posts, leaving a total of 83, 293 participants with 247 posts each on average. We consider each post as one sentence in the training data

participants: 100
We restrict the words to a dictionary of the 50K most frequent words in the dataset. Following [27], we randomly select 100 participants per round. Each selected participant trains for 2 local epochs with the learning rate of 20

posts: 5034
Each selected participant trains for 2 local epochs with the learning rate of 20. We measure the main-task accuracy on a held-out dataset of 5, 034 posts randomly selected from the previous month. Backdoors

guarantees: 3
Varying the scaling factor. Eq 3 guarantees that when the attacker’s update Ltm+1 =. + Gt is scaled by γ

participants: 1000
The attacker simply creates a backdoored model that is close to the global model and submits it for every participant it controls. We conducted an experiment using 1000 participants in a single round. Fig. 11 shows that participants’ updates are very noisy and if the attacker controls a tiny fraction of the participants, the probability that Krum selects the attacker’s model is very high

Reference
  • M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in CCS, 2016.
    Google ScholarFindings
  • B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” in ICML, 2012.
    Google ScholarFindings
  • P. Blanchard, R. Guerraoui, J. Stainer et al., “Machine learning with adversaries: Byzantine tolerant gradient descent,” in NIPS, 2017.
    Google ScholarFindings
  • K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for privacy-preserving machine learning,” in CCS, 2017.
    Google ScholarFindings
  • N. Carlini, C. Liu, J. Kos, U. Erlingsson, and D. Song, “The secret sharer: Measuring unintended neural network memorization & extracting secrets,” arXiv:1802.08232, 2018.
    Findings
  • J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz, “Revisiting distributed synchronous SGD,” arXiv:1604.00981, 2016.
    Findings
  • X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,” arXiv:1712.05526, 2017.
    Findings
  • Y. Chen, L. Su, and J. Xu, “Distributed statistical machine learning in adversarial settings: Byzantine gradient descent,” arXiv:1705.05491, 2017.
    Findings
  • “decentralizedML,” https://decentralizedml.com/, 2018, [Online; accessed 14-May-2018].
    Findings
  • C. Fung, C. J. Yoon, and I. Beschastnikh, “Mitigating sybils in federated learning poisoning,” arXiv preprint arXiv:1808.04866, 2018.
    Findings
  • I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv:1412.6572, 2014.
    Findings
  • “Under the hood of the Pixel 2: How AI is supercharging hardware,” https://ai.google/stories/ai-in-hardware/, 2018, [Online; accessed 14-May2018].
    Findings
  • T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnerabilities in the machine learning model supply chain,” arXiv:1708.06733, 2017.
    Findings
  • S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne, “Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption,” arXiv:1711.10677, 2017.
    Findings
  • K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
    Google ScholarFindings
  • G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv:1503.02531, 2015.
    Findings
  • L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. Tygar, “Adversarial machine learning,” in AISec, 2011.
    Google ScholarFindings
  • H. Inan, K. Khosravi, and R. Socher, “Tying word vectors and word classifiers: A loss framework for language modeling,” arXiv:1611.01462, 2016.
    Findings
  • Y. Ji, X. Zhang, S. Ji, X. Luo, and T. Wang, “Model-reuse attacks on deep learning systems,” CCS, 2018.
    Google ScholarLocate open access versionFindings
  • J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proc. NAS, vol. 114, no. 13, pp. 3521–3526, 2017.
    Google ScholarLocate open access versionFindings
  • J. Konecny, H. B. McMahan, F. X. Yu, P. Richtarik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv:1610.05492, 2016.
    Findings
  • A. D. Kramer, J. E. Guillory, and J. T. Hancock, “Experimental evidence of massive-scale emotional contagion through social networks,” Proc. NAS, vol. 111, no. 24, pp. 8788–8790, 2014.
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep., 2009.
    Google ScholarFindings
  • A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” arXiv:1607.02533, 2016.
    Findings
  • Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” NDSS, 2017.
    Google ScholarLocate open access versionFindings
  • S. Mahloujifar, M. Mahmoody, and A. Mohammed, “Multi-party poisoning through generalized p-tampering,” arXiv preprint arXiv:1809.03474, 2018.
    Findings
  • H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. Aguera y Arcas, “Communication-efficient learning of deep networks from decentralized data,” arXiv:1602.05629, 2016.
    Findings
  • H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, “Learning differentially private recurrent language models,” in ICLR, 2018.
    Google ScholarFindings
  • T. Minka, “Estimating a Dirichlet distribution,” MIT, Tech. Rep., 2000.
    Google ScholarLocate open access versionFindings
  • P. Mohassel and Y. Zhang, “SecureML: A system for scalable privacypreserving machine learning,” in S&P, 2017. [31] “OpenMined,” https://www.openmined.org/, 2018, [Online; accessed 14-
    Findings
  • [32] N. Papernot, M. Abadi, U. Erlingsson, I. Goodfellow, and K. Talwar, “Semi-supervised knowledge transfer for deep learning from private training data,” arXiv:1610.05755, 2016.
    Findings
  • [33] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in AsiaCCS, 2017.
    Google ScholarLocate open access versionFindings
  • [34] N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and U. Erlingsson, “Scalable private learning with PATE,” arXiv:1802.08908, 2018.
    Findings
  • [35] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-W, 2017.
    Google ScholarFindings
  • [36] O. Press and L. Wolf, “Using the output embedding to improve language models,” arXiv:1608.05859, 2016.
    Findings
  • [37] “PyTorch Examples,” https://github.com/pytorch/examples/tree/master/word language model/, 2018, [Online; accessed 14-Aug-2018].
    Findings
  • [38] M. Qiao and G. Valiant, “Learning discrete distributions from untrusted batches,” arXiv:1711.08113, 2017.
    Findings
  • [39] B. I. P. Rubinstein, B. Nelson, L. Huang, A. D. Joseph, S. Lau, S. Rao, N. Taft, and J. D. Tygar, “ANTIDOTE: Understanding and defending against poisoning of anomaly detectors,” in IMC, 2009.
    Google ScholarFindings
  • [40] S. Shen, S. Tople, and P. Saxena, “Auror: Defending against poisoning attacks in collaborative deep learning systems,” in ACSAC, 2016.
    Google ScholarFindings
  • [41] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in CCS, 2015.
    Google ScholarFindings
  • [42] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in S&P, 2017.
    Google ScholarFindings
  • [43] C. Song, T. Ristenpart, and V. Shmatikov, “Machine learning models that remember too much,” in CCS, 2017.
    Google ScholarFindings
  • [44] J. Steinhardt, P. W. Koh, and P. S. Liang, “Certified defenses for data poisoning attacks,” in NIPS, 2017.
    Google ScholarFindings
  • [45] M. Yeomans, A. K. Shah, S. Mullainathan, and J. Kleinberg, “Making sense of recommendations,” Management Science, 2016.
    Google ScholarLocate open access versionFindings
  • [46] D. Yin, Y. Chen, K. Ramchandran, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” in ICML, 2018.
    Google ScholarFindings
  • [47] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” in ICLR, 2017.
    Google ScholarFindings
  • [48] X. Zhang, X. Y. Felix, S. Kumar, and S.-F. Chang, “Learning spread-out local feature descriptors.” in ICCV, 2017.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments