Making machine learning robust against adversarial inputs

Communications of the ACM, pp. 56-66, 2018.

Cited by: 105|Bibtex|Views230|DOI:https://doi.org/10.1145/3134599
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|dl.acm.org
Weibo:
A subfield of artificial intelligence, most modern machine learning, as used in production, can essentially be understood as applied function approximation; when there is some mapping from an input x to an output y that is difficult for a programmer to describe through explicit c...

Abstract:

Such inputs distort how machine-learning-based systems are able to function in the world as it is.

Code:

Data:

0
Introduction
  • MACHINE LEARNING HAS advanced radically over the past 10 years, and machine learning algorithms achieve human-level performance or better on a number of tasks, including face recognition,[31] optical character recognition,[8] object recognition,[29] and playing the game Go.[26].
  • Such assumptions have been useful for designing effective machine learning algorithms but implicitly rule out the possibility that an adversary could alter the distribution at either training time or test time.
Highlights
  • MACHINE LEARNING HAS advanced radically over the past 10 years, and machine learning algorithms achieve human-level performance or better on a number of tasks, including face recognition,[31] optical character recognition,[8] object recognition,[29] and playing the game Go.[26]
  • The modern generation of machine learning services is a result of nearly 50 years of research and development in artificial intelligence—the study of computational algorithms and systems that reason about their environment to make predictions.[25]
  • A subfield of artificial intelligence, most modern machine learning, as used in production, can essentially be understood as applied function approximation; when there is some mapping from an input x to an output y that is difficult for a programmer to describe through explicit code, a machine learning algorithm can learn an approximation of the mapping by analyzing a dataset containing several examples of inputs and their corresponding outputs
  • The inputs x are usually assumed to all be drawn independently from the same probability distribution at both training and test time. This means that while test inputs x are new and previously unseen during the training process, they at least have the same statistical properties as the inputs used for training. Such assumptions have been useful for designing effective machine learning algorithms but implicitly rule out the possibility that an adversary could alter the distribution at either training time or test time
  • We present here three canonical examples of gradient-based attacks: the L-BFGS approach, the Fast Gradient Sign Method (FGSM), and the Jacobian Saliency Map Approach (JSMA)
  • A drawback of this strategy is that, compared to other black-box strategies,[21,30] the algorithm must be able to make a large number of model-prediction queries before it finds evasive variants and is more likely to be detected at runtime. Defenses and Their Limitations Given the existence of adversarial samples, an important question is: What can be done to defend models against them? The defensive property of a machine learning system in this context is called “robustness against adversarial samples.” We explore several known defenses categorized into three classes: model training, input validation, and architectural changes
Results
  • A machine learning algorithm is expected to produce a model capable of predicting the correct class of a given input.
  • As is common in modeling the security of any domain, the domain of adversaries against machine learning systems can be structured around a taxonomy of capabilities and goals.[2,22,23] As reflected in Figure 4, the adversary’s strength is characterized by its ability to access the model’s architecture, parameter values, and training data.
  • For an attack to be worth studying from a machine learning point of view, it is necessary to impose constraints that ensure the adversary is not able to truly change the class of the input.
  • In a white-box scenario, the adversary has full access to the model whereby the adversary knows what machine learning algorithm is being used and the values of the model’s parameters.
  • One way to formulate an adversarial attack is where Jf is the expected loss incurred by the machine learning model, and is a way to measure the model’s prediction error.
  • The defensive property of a machine learning system in this context is called “robustness against adversarial samples.” We explore several known defenses categorized into three classes: model training, input validation, and architectural changes.
  • This "transferability" property, first observed among deep neural networks and linear models by Szegedy et al.,[30] is known to hold across many types of machine learning models.[20] Figure 7 reports transferability rates, the number of adversarial examples misclassified by a model B, despite being crafted with a different model A, for several pairs of machine learning models trained on a standard image-recognition benchmark as in the MNIST dataset of handwritten digits.
  • 31.92 Ens. pensively with the fast-gradient-sign method and made it computationally efficient to continuously generate new adversarial examples every time the model parameters change during the training process.
Conclusion
  • Nonlinear machine learning models are more robust to adversarial examples and more difficult to train and generally do not perform well in a baseline non-adversarial setting.[9]
  • A model that is tested and found to be robust against the fast gradient sign method of adversarial example generation[9] may be vulnerable to computationally expensive methods like attacks based on L-BFGS.[30]
Summary
  • MACHINE LEARNING HAS advanced radically over the past 10 years, and machine learning algorithms achieve human-level performance or better on a number of tasks, including face recognition,[31] optical character recognition,[8] object recognition,[29] and playing the game Go.[26].
  • Such assumptions have been useful for designing effective machine learning algorithms but implicitly rule out the possibility that an adversary could alter the distribution at either training time or test time.
  • A machine learning algorithm is expected to produce a model capable of predicting the correct class of a given input.
  • As is common in modeling the security of any domain, the domain of adversaries against machine learning systems can be structured around a taxonomy of capabilities and goals.[2,22,23] As reflected in Figure 4, the adversary’s strength is characterized by its ability to access the model’s architecture, parameter values, and training data.
  • For an attack to be worth studying from a machine learning point of view, it is necessary to impose constraints that ensure the adversary is not able to truly change the class of the input.
  • In a white-box scenario, the adversary has full access to the model whereby the adversary knows what machine learning algorithm is being used and the values of the model’s parameters.
  • One way to formulate an adversarial attack is where Jf is the expected loss incurred by the machine learning model, and is a way to measure the model’s prediction error.
  • The defensive property of a machine learning system in this context is called “robustness against adversarial samples.” We explore several known defenses categorized into three classes: model training, input validation, and architectural changes.
  • This "transferability" property, first observed among deep neural networks and linear models by Szegedy et al.,[30] is known to hold across many types of machine learning models.[20] Figure 7 reports transferability rates, the number of adversarial examples misclassified by a model B, despite being crafted with a different model A, for several pairs of machine learning models trained on a standard image-recognition benchmark as in the MNIST dataset of handwritten digits.
  • 31.92 Ens. pensively with the fast-gradient-sign method and made it computationally efficient to continuously generate new adversarial examples every time the model parameters change during the training process.
  • Nonlinear machine learning models are more robust to adversarial examples and more difficult to train and generally do not perform well in a baseline non-adversarial setting.[9]
  • A model that is tested and found to be robust against the fast gradient sign method of adversarial example generation[9] may be vulnerable to computationally expensive methods like attacks based on L-BFGS.[30]
Funding
  • Author Nicolas Papernot is supported by a Google Ph.D
  • Research was supported in part by the Army Research Laboratory under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA) and the Army Research Office under grant W911NF-13-1-0421
Reference
  • Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C.E.R.T. Drebin: Effective and explainable detection of Android malware in your pocket. In Proceedings of the NDSS Symposium (San Diego, CA, Feb.). Internet Society, Reston, VA, 2014, 23–26.
    Google ScholarLocate open access versionFindings
  • Barreno, M., Nelson, B., Sears, R., Joseph, A.D., and Tygar, J.D. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security (Taipei, Taiwan, Mar. 21–24). ACM Press, New York, 2006, 16–25.
    Google ScholarLocate open access versionFindings
  • Bolton, R.J. and Hand, D.J. Statistical fraud detection: A review. Statistical Science 17, 3 (2002), 235–249.
    Google ScholarLocate open access versionFindings
  • Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. arXiv preprint, 2016; https://arxiv.org/pdf/1608.04644.pdf
    Findings
  • Dang, H., Yue, H., and Chang, E.C. Evading classifier in the dark: Guiding unpredictable morphing using binary-output blackboxes. arXiv preprint, 2017; https://arxiv.org/pdf/1705.07535.pdf
    Findings
  • Glorot, X., Bordes, A., and Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (Ft. Lauderdale, FL, Apr. 11–13, 2011), 315-323.
    Google ScholarLocate open access versionFindings
  • Goodfellow, I., Bengio, Y., and Courville, A. Deep Learning. MIT Press, Cambridge, MA, 2016; http://www.deeplearningbook.org/
    Findings
  • Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., and Shet, V. Multi-digit number recognition from Street View imagery using deep convolutional neural networks. In Proceedings of the International Conference on Learning Representations (Banff, Canada, Apr. 14–16, 2014).
    Google ScholarLocate open access versionFindings
  • Goodfellow, I.J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. arXiv preprint, 2014; https://arxiv.org/pdf/1412.6572.pdf
    Findings
  • Grosse, K., Papernot, N., Manoharan, P., Backes, M., and McDaniel, P. Adversarial perturbations against deep neural networks for malware classification. In Proceedings of the European Symposium on Research in Computer Security (Oslo, Norway, 2017).
    Google ScholarLocate open access versionFindings
  • Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network. arXiv preprint, 2015; https://arxiv.org/abs/1503.02531
    Findings
  • Huang, S., Papernot, N., Goodfellow, I., Duan, Y., and Abbeel, P. Adversarial attacks on neural network policies. arXiv preprint, 2017; https://arxiv.org/abs/1702.02284
    Findings
  • Huang, A., Kwiatkowska, M., Wang, S., and Wu, M. Safety verification of deep neural networks. In Proceedings of the International Conference on Computer-Aided Verification (2016); https://link.springer.com/chapter/10.1007/978-3-319-63387-9_1
    Locate open access versionFindings
  • Jarrett, K., Kavukcuoglu, K., Ranzato, M.A., and LeCun, Y. What is the best multi-stage architecture for object recognition? In Proceedings of the 12th IEEE International Conference on Computer Vision (Kyoto, Japan, Sept. 27–Oct. 4). IEEE Press, 2009.
    Google ScholarLocate open access versionFindings
  • Katz, G., Barrett, C., Dill, D., Julian, K., and Kochenderfer, M. Reluplex: An efficient SMT solver for verifying deep neural networks. In Proceedings of the International Conference on Computer-Aided Verification. Springer, Cham, 2017, 97–117.
    Google ScholarLocate open access versionFindings
  • Kurakin, A., Goodfellow, I., and Bengio, S. Adversarial examples in the physical world. In Proceedings of the International Conference on Learning Representations (2017); https://arxiv.org/abs/1607.02533
    Findings
  • Murphy, K.P. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA, 2012.
    Google ScholarFindings
  • Nair, V. and Hinton, G.E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the International Conference on Machine Learning (Haifa, Israel, June 21–24, 2010).
    Google ScholarLocate open access versionFindings
  • Papernot, N., Goodfellow, I., Sheatsley, R., Feinman, R., and McDaniel, P. CleverHans v2.1.0: An adversarial machine learning library; https://github.com/tensorflow/cleverhans
    Findings
  • Papernot, N., McDaniel, P., and Goodfellow, I. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv preprint, 2016; https://arxiv.org/abs/1605.07277
    Findings
  • Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., and Swami, A. Practical black-box attacks against deep learning systems using adversarial examples. In Proceedings of the ACM Asia Conference on Computer and Communications Security (Abu Dhabi, UAE). ACM Press, New York, 2017.
    Google ScholarLocate open access versionFindings
  • Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., and Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (Saarbrücken, Germany, Mar. 21–24). IEEE Press, 2016, 372–387.
    Google ScholarLocate open access versionFindings
  • Papernot, N., McDaniel, P., Sinha, A., and Wellman, M. Towards the science of security and privacy in machine learning. In Proceedings of the Third IEEE European Symposium on Security and Privacy (London, U.K.); https://arxiv.org/abs/1611.03814
    Findings
  • Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 37th IEEE Symposium on Security and Privacy (San Jose, CA, May 23–25). IEEE Press, 2016, 582–597.
    Google ScholarLocate open access versionFindings
  • Russell, S. and Norvig, P. Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs, NJ, 1995, 25–27.
    Google ScholarFindings
  • Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484–489.
    Google ScholarLocate open access versionFindings
  • Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks (2012); https://doi.org/10.1016/j.neunet.2012.02.016 28. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, C., Vanhoucke, V., and Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Press, 2015, 1–9.
    Locate open access versionFindings
  • 29. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking the Inception architecture for computer vision. ArXiv e-prints, Dec. 2015; https://arxiv.org/abs/1512.00567 30. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In Proceedings of the International Conference on Learning Representations, 2014.
    Findings
  • 31. Taigman, Y., Yang, M., Ranzato, M.A., and Wolf, L. DeepFace: Closing the gap to human-level performance in face verification. In Proceedings of the Computer Vision and Pattern Recognition Conference. IEEE Press, 2014.
    Google ScholarLocate open access versionFindings
  • 32. Tramèr, F., Kurakin, A., Papernot, N., Boneh, D., and McDaniel, P. Ensemble adversarial training: Attacks and defenses. arXiv preprint, 2017; https://arxiv.org/abs/1705.07204 33. Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., and Ristenpart, T. Stealing machine learning models via prediction APIs. In Proceedings of the USENIX Security Conference (San Francisco, CA, Jan. 25–27). USENIX Association, Berkeley, CA, 2016.
    Findings
  • 34. Wolpert, D.H. The lack of a priori distinctions between learning algorithms. Neural Computation 8, 7 (1996), 1341–1390.
    Google ScholarLocate open access versionFindings
  • 35. Xu, W., Qi, Y., and Evans, D. Automatically evading classifiers. In Proceedings of the 2016 Network and Distributed Systems Symposium (San Diego, CA, Feb. 21–24). Internet Society, Reston, VA, 2016. Watch the authors discuss their work in this exclusive Communications video. https://cacm.acm.org/videos/making-machine-learning-robustagainst-adversarial-inputs
    Locate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments