Thermometer Encoding: One Hot Way To Resist Adversarial Examples

international conference on learning representations, 2018.

Cited by: 122|Bibtex|Views159
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de
Weibo:
Our findings convincingly demonstrate that the use of thermometer encodings, in combination with adversarial training, can reduce the vulnerability of neural network models to adversarial attacks

Abstract:

It is well known that it is possible to construct examples for neural networks: inputs which are misclassified by the network yet indistinguishable from true data. We propose a simple modification to standard neural network architectures, thermometer encoding, which significantly increases the robustness of the network to adversarial exa...More

Code:

Data:

0
Introduction
  • AND RELATED WORK

    Adversarial examples are inputs to machine learning models that are intentionally designed to cause the model to produce an incorrect output.
  • Madry et al (2017) showed that adversarial training using adversarial examples created by adding random noise before running BIM results in a model that is highly robust against all known attacks on the MNIST dataset.
  • It is less effective on more complex datasets, such as CIFAR.
  • The authors demonstrate that thermometer code discretization and one-hot code discretization of real-valued inputs to a model significantly improves its robustness to adversarial attack, advancing the state of the art in this field
Highlights
  • AND RELATED WORK

    Adversarial examples are inputs to machine learning models that are intentionally designed to cause the model to produce an incorrect output
  • Adversarial examples that fool one model often fool another model, even if the two models are trained on different training examples or have different architectures (Szegedy et al, 2014), so an attacker can fool a model without access to it
  • Madry et al (2017) showed that adversarial training using adversarial examples created by adding random noise before running basic iterative method results in a model that is highly robust against all known attacks on the MNIST dataset
  • We found that in all cases, Logit-Space Projected Gradient Ascent was strictly more powerful than Discrete Gradient Ascent, so all attacks on discretized models use Logit-Space Projected Gradient Ascent with ξ = 0.01, δ = 1.2, and 1 random restart
  • Our findings convincingly demonstrate that the use of thermometer encodings, in combination with adversarial training, can reduce the vulnerability of neural network models to adversarial attacks
  • Our analysis reveals that the resulting networks are significantly less linear with respect to their inputs, supporting the hypothesis of Goodfellow et al (2014) that many adversarial examples are caused by over-generalization in networks that are too linear
Methods
  • The authors compare models trained with input discretization to state-of-the-art adversarial defenses on a variety of datasets.
  • For the MNIST experiments, the authors use a convolutional network; for CIFAR-10, CIFAR-100, and SVHN the authors use a Wide ResNet (Zagoruyko & Komodakis, 2016).
  • The authors use a network of depth 30 for the CIFAR10 and CIFAR-100 datasets, while for SVHN the authors use a network of depth 15.
  • 1 Unless otherwise specified, all quantized and discretized models use 16 levels
Results
  • The authors' adversarially-trained baseline models were able to approximately replicate the results of Madry et al (2017).
  • On all datasets, discretizing the inputs of the network dramatically improves resistance to adversarial examples, while barely sacrificing any accuracy on clean examples.
  • Quantized models beat the baseline, but with lower accuracy on clean examples.
  • See Tables 2,3,4 and 5 for results on MNIST and CIFAR-10.
  • Additional results on CIFAR-100 and SVHN are included in the appendix
Conclusion
  • In Goodfellow et al (2014), the seeming linearity of deep neural networks was shown by visualizing the networks in several different ways.
  • Discretizing using 16 levels introduced 0.03% extra parameters for MNIST, 0.08% for CIFAR-10 and CIFAR-100, and 2.3% for SVHN
  • This increase is negligible, so it is likely that the robustness comes from the input discretization, and is not merely a byproduct of having a slightly higher-capacity model.The authors' findings convincingly demonstrate that the use of thermometer encodings, in combination with adversarial training, can reduce the vulnerability of neural network models to adversarial attacks.
  • The authors' analysis reveals that the resulting networks are significantly less linear with respect to their inputs, supporting the hypothesis of Goodfellow et al (2014) that many adversarial examples are caused by over-generalization in networks that are too linear
Summary
  • Introduction:

    AND RELATED WORK

    Adversarial examples are inputs to machine learning models that are intentionally designed to cause the model to produce an incorrect output.
  • Madry et al (2017) showed that adversarial training using adversarial examples created by adding random noise before running BIM results in a model that is highly robust against all known attacks on the MNIST dataset.
  • It is less effective on more complex datasets, such as CIFAR.
  • The authors demonstrate that thermometer code discretization and one-hot code discretization of real-valued inputs to a model significantly improves its robustness to adversarial attack, advancing the state of the art in this field
  • Methods:

    The authors compare models trained with input discretization to state-of-the-art adversarial defenses on a variety of datasets.
  • For the MNIST experiments, the authors use a convolutional network; for CIFAR-10, CIFAR-100, and SVHN the authors use a Wide ResNet (Zagoruyko & Komodakis, 2016).
  • The authors use a network of depth 30 for the CIFAR10 and CIFAR-100 datasets, while for SVHN the authors use a network of depth 15.
  • 1 Unless otherwise specified, all quantized and discretized models use 16 levels
  • Results:

    The authors' adversarially-trained baseline models were able to approximately replicate the results of Madry et al (2017).
  • On all datasets, discretizing the inputs of the network dramatically improves resistance to adversarial examples, while barely sacrificing any accuracy on clean examples.
  • Quantized models beat the baseline, but with lower accuracy on clean examples.
  • See Tables 2,3,4 and 5 for results on MNIST and CIFAR-10.
  • Additional results on CIFAR-100 and SVHN are included in the appendix
  • Conclusion:

    In Goodfellow et al (2014), the seeming linearity of deep neural networks was shown by visualizing the networks in several different ways.
  • Discretizing using 16 levels introduced 0.03% extra parameters for MNIST, 0.08% for CIFAR-10 and CIFAR-100, and 2.3% for SVHN
  • This increase is negligible, so it is likely that the robustness comes from the input discretization, and is not merely a byproduct of having a slightly higher-capacity model.The authors' findings convincingly demonstrate that the use of thermometer encodings, in combination with adversarial training, can reduce the vulnerability of neural network models to adversarial attacks.
  • The authors' analysis reveals that the resulting networks are significantly less linear with respect to their inputs, supporting the hypothesis of Goodfellow et al (2014) that many adversarial examples are caused by over-generalization in networks that are too linear
Tables
  • Table1: Examples mapping from continuous-valued inputs to quantized inputs, one-hot codes, and thermometer codes, with ten evenly-spaced levels
  • Table2: Comparison of adversarial robustness to white-box attacks on MNIST
  • Table3: Comparison of adversarial robustness to black-box attacks on MNIST
  • Table4: Comparison of adversarial robustness to white-box attacks on CIFAR-10
  • Table5: Comparison of adversarial robustness to black-box attacks on CIFAR-10
  • Table6: Comparison of adversarial robustness to white-box attacks on MNIST using 16 levels and with various choices of the hyperparameters ξ and δ for Algorithm 3. The models are evaluated on white-box attacks and on black-box attacks using a vanilla, clean trained model; both use LS-PGA
  • Table7: Comparison of adversarial robustness to white-box attacks on MNIST using a mix of clean and adversarial examples
  • Table8: Comparison of adversarial robustness to black-box attacks on MNIST of various models using a mix of clean and adversarial examples
  • Table9: Comparison of adversarial robustness on MNIST as the number of levels of discretization is varied. All models are trained mix of adversarial examples and clean examples
  • Table10: Comparison of adversarial robustness to white-box attacks on CIFAR-10 of various models using a mix of regular and adversarial training
  • Table11: Comparison of adversarial robustness to black-box attacks on CIFAR-10 of various models using a mix of clean and adversarial examples
  • Table12: Comparison of adversarial robustness on CIFAR-10 as the number of levels of discretization is varied. All models are trained only on adversarial examples
  • Table13: Comparison of adversarial robustness on CIFAR-100. All adversarially trained models were trained on a mix of clean and adversarial examples
  • Table14: Comparison of adversarial robustness on SVHN
Download tables as Excel
Reference
  • Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In International Conference on Machine Learning, pp. 854–863, 2017.
    Google ScholarLocate open access versionFindings
  • Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. arXiv preprint arXiv:1302.4389, 2013.
    Findings
  • Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
    Findings
  • Jun Han and Claudio Moraga. The influence of the sigmoid function parameters on the speed of backpropagation learning. From Natural to Artificial Neural Computation, pp. 195–201, 1995.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8): 1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
    Findings
  • Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
    Findings
  • Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv preprint arXiv:1312.4400, 2013.
    Findings
  • Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. CoRR, abs/1611.02770, 2016. URL http://arxiv.org/abs/1611.02770.
    Findings
  • Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
    Findings
  • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
    Findings
  • Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759, 2016.
    Findings
  • Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against deep learning systems using adversarial examples. arXiv preprint arXiv:1602.02697, 2016.
    Findings
  • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014. URL http://arxiv.org/abs/1312.6199.
    Findings
  • Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
    Findings
  • Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
    Findings
Full Text
Your rating :
0

 

Tags
Comments