# Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test Examples

NIPS 2020, 2020.

EI

Weibo:

Abstract:

We present a transductive learning algorithm that takes as input training examples from a distribution $P$ and arbitrary (unlabeled) test examples, possibly chosen by an adversary. This is unlike prior work that assumes that test examples are small perturbations of $P$. Our algorithm outputs a selective classifier, which abstains from p...More

Code:

Data:

Introduction

- Consider binary classification where test examples are not from the training distribution.
- Consider learning a binary function ∶ → {0, 1} where training examples are assumed to be iid from a distribution over , while the test examples are arbitrary.
- This includes both the possibility that test examples are chosen by an adversary or that they are drawn from a distribution.
- Adversarial spammers synthesize endless variations of explicit images that evade these detectors for purposes such as advertising and phishing [Yuan et al, 2019]

Highlights

- Consider binary classification where test examples are not from the training distribution
- As a troubling adversarial example, consider explicit content detectors which are trained to classify normal vs. explicit images
- As we argue, learning with arbitrary test examples requires selective classifiers and transductive learning, which have each been independently studied extensively
- This paper can be viewed as a generalization of this theorem to the case where ≠, obtaining Θ (√ ∕ ) rates
- When =, unlabeled samples from are readily available by ignoring labels of some training data, but unlabeled test samples are necessary when ≠. No prior such guarantee was known for arbitrary ≠, even for simple classes such as intervals, perhaps because it may have seemed impossible to guarantee anything meaningful in the general case
- Even the simple approach of training a classifier to distinguish unlabeled train vs. test examples may be adequate in some applications, though for theoretical guarantees one requires somewhat more sophisticated algorithms

Methods

- As a proof of concept, the authors perform simple controlled experiments on the task of handwritten letter classification using lower-case English letters from the EMNIST dataset (Cohen et al [2017]).
- Rather than classifying sensitive attributes such as explicit images or gender, the authors perform simple experiments on handwritten letter classification from the popular EMNIST dataset [Cohen et al, 2017].
- For both experiments, the training data consisted of the eight lowercase letters adehlnrt, chosen because they each had more than 10,000 instances.
- The result was 6,000 samples per letter constituting 48,000 samples from adv

Conclusion

- The fundamental theorem of statistical learning states that an algorithm for class is asymptotically nearly optimal requiring Θ ( ∕ ) labeled examples for learning arbitrary distributions when

= [see, e.g., Shalev-Shwartz and Ben-David, 2014]. - When = , unlabeled samples from are readily available by ignoring labels of some training data, but unlabeled test samples are necessary when ≠.
- No prior such guarantee was known for arbitrary ≠ , even for simple classes such as intervals, perhaps because it may have seemed impossible to guarantee anything meaningful in the general case.
- Even the simple approach of training a classifier to distinguish unlabeled train vs. test examples may be adequate in some applications, though for theoretical guarantees one requires somewhat more sophisticated algorithms

Summary

## Introduction:

Consider binary classification where test examples are not from the training distribution.- Consider learning a binary function ∶ → {0, 1} where training examples are assumed to be iid from a distribution over , while the test examples are arbitrary.
- This includes both the possibility that test examples are chosen by an adversary or that they are drawn from a distribution.
- Adversarial spammers synthesize endless variations of explicit images that evade these detectors for purposes such as advertising and phishing [Yuan et al, 2019]
## Objectives:

The authors' goal is to learn a target function ∈ of VC dimension with training distribution over.- In standard agnostic learning with respect to , the authors suppose there is some classifier ∈ with error err ( ) ≤ and the authors aim to find a classifier whose generalization error is not much greater than
## Methods:

As a proof of concept, the authors perform simple controlled experiments on the task of handwritten letter classification using lower-case English letters from the EMNIST dataset (Cohen et al [2017]).- Rather than classifying sensitive attributes such as explicit images or gender, the authors perform simple experiments on handwritten letter classification from the popular EMNIST dataset [Cohen et al, 2017].
- For both experiments, the training data consisted of the eight lowercase letters adehlnrt, chosen because they each had more than 10,000 instances.
- The result was 6,000 samples per letter constituting 48,000 samples from adv
## Conclusion:

The fundamental theorem of statistical learning states that an algorithm for class is asymptotically nearly optimal requiring Θ ( ∕ ) labeled examples for learning arbitrary distributions when

= [see, e.g., Shalev-Shwartz and Ben-David, 2014].- When = , unlabeled samples from are readily available by ignoring labels of some training data, but unlabeled test samples are necessary when ≠.
- No prior such guarantee was known for arbitrary ≠ , even for simple classes such as intervals, perhaps because it may have seemed impossible to guarantee anything meaningful in the general case.
- Even the simple approach of training a classifier to distinguish unlabeled train vs. test examples may be adequate in some applications, though for theoretical guarantees one requires somewhat more sophisticated algorithms

Related work

- The redaction model combines SC and transductive learning, which have each been extensively studied, separately. We first discuss prior work on these topics, which (with the notable exception (left) first trains h on labeled training data, then finds other candidate classifiers 1, 2, such that h and have high disagreement onand low disagreement on , and rejects examples where h and disagree.

(right) aims to distinguish unlabeled train and test examples using pairs of classifiers , ′ that agree on training data but disagree on many tests. Both reject: (1) clearly unpredictable examples which are very far from train and (2) a suspiciously dense cluster of tests which might all be positive despite being close to negatives.

also rejects (3).

of online SC) has generally been considered when test examples are from the same distribution as training examples.

Selective classification Selective classification go by various names including “classification with a reject option” and “reliable learning.” To the best of our knowledge, prior work has not considered SC using unlabeled samples from ≠ . Early learning theory work by Rivest and Sloan [1988] required a guarantee of 0 test errors and few rejections. However, Kivinen [1990] showed that, for this definition, even learning rectangles under uniform distributions = requires exponential number of examples (as cited by Hopkins et al [2019] which like much other work therefore makes further assumptions on and ). Most of this work assumes the same training and test distributions, without adversarial modification. Kanade et al [2009] give a SC reduction to an agnostic learner (similar in spirit to our reduction to ) but again for the case of = .

Study subjects and analysis

samples: 56000

For both experiments, the training data consisted of the eight lowercase letters adehlnrt, chosen because they each had more than 10,000 instances. From each letter, 3,000 instances of each letter were reserved for use later, leaving 7,000 examples, each constituting 56,000 samples from .

We then considered two test distributions, adv, nat representing adversarial and natural settings. adv consisted of a mix of 50% samples from (the 3,000 reserved instances per lower-case letter mentioned above) and 50% samples from an adversary that used a classifier h as a black box. To that, we added 3,000 adversarial examples for each letter selected as follows: the reserved 3,000 letters were labeled by h and the adversary selected the first misclassified instance for each letter

We then considered two test distributions, adv, nat representing adversarial and natural settings. adv consisted of a mix of 50% samples from (the 3,000 reserved instances per lower-case letter mentioned above) and 50% samples from an adversary that used a classifier h as a black box. To that, we added 3,000 adversarial examples for each letter selected as follows: the reserved 3,000 letters were labeled by h and the adversary selected the first misclassified instance for each letter

samples: 6000

It made 3,000 imperceptible modifications of each of the above instances by changing the intensity value of a single pixel by at most 4 (out of 256). The result was 6,000 samples per letter constituting 48,000 samples from adv.

samples: 56000

For both experiments, the training data consisted of the eight lowercase letters adehlnrt, chosen because they each had more than 10,000 instances. From each letter, 3,000 instances of each letter were reserved for use later, leaving 7,000 examples, each constituting 56,000 samples from. We then considered two test distributions, adv, nat representing adversarial and natural settings. adv consisted of a mix of 50% samples from (the 3,000 reserved instances per lower-case letter mentioned above) and 50% samples from an adversary that used a classifier h as a black box

samples: 6000

It made 3,000 imperceptible modifications of each of the above instances by changing the intensity value of a single pixel by at most 4 (out of 256). The result was 6,000 samples per letter constituting 48,000 samples from adv. For nat, the test set also consisted of 6,000 samples per letter, with 3,000 reserved samples from as above

samples: 6000

The result was 6,000 samples per letter constituting 48,000 samples from adv. For nat, the test set also consisted of 6,000 samples per letter, with 3,000 reserved samples from as above. In this case, the remaining half of the letters were simply upper-case4 versions of the letters ADEHLNRT, taken from the EMNIST dataset (case information is also available in that dataset)

Reference

- Rémi Bardenet, Odalric-Ambrym Maillard, et al. Concentration inequalities for sampling without replacement. Bernoulli, 21(3):1361–1385, 2015.
- Shai Ben-David and Ruth Urner. On the hardness of domain adaptation and the utility of unlabeled target samples. In Nader H. Bshouty, Gilles Stoltz, Nicolas Vayatis, and Thomas Zeugmann, editors, Algorithmic Learning Theory, pages 139–153, Berlin, Heidelberg, 201Springer Berlin Heidelberg. ISBN 978-3-642-34106-9.
- Anselm Blumer, A. Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and the vapnik-chervonenkis dimension. J. ACM, 36(4):929–965, October 1989. ISSN 0004-5411. doi: 10.1145/76359.76371. URL https://doi.org/10.1145/76359.76371.
- Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77–91, 2018.
- Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3–14. ACM, 2017.
- Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adversarial robustness. In Advances in Neural Information Processing Systems, pages 11190–11201, 2019.
- Chi-Keung Chow. An optimum character recognition system using decision functions. IRE Transactions on Electronic Computers, (4):247–254, 1957.
- Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. Emnist: Extending mnist to handwritten letters. In 2017 International Joint Conference on Neural Networks (IJCNN), pages 2921–2926. IEEE, 2017.
- Yonatan Geifman and Ran El-Yaniv. Selectivenet: A deep neural network with an integrated reject option. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 2151–215PMLR, 2019. URL http://proceedings.mlr.press/v97/geifman19a.html.
- Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Proceedings of the 3rd International Conference on Learning Representations, ICLR, 2015. URL http://arxiv.org/abs/1412.6572.
- Max Hopkins, Daniel M Kane, and Shachar Lovett. The power of comparisons for actively learning linear classifiers. arXiv preprint arXiv:1907.03816, 2019.
- Jiayuan Huang, Arthur Gretton, Karsten Borgwardt, Bernhard Schölkopf, and Alex J Smola. Correcting sample selection bias by unlabeled data. In Advances in neural information processing systems, pages 601–608, 2007.
- Varun Kanade, Adam Tauman Kalai, and Yishay Mansour. Reliable agnostic learning. In Proceedings of the 22nd Annual Conference on Learning Theory (COLT), 2009, June 2009. URL https://www.microsoft.com/en-us/research/publication/reliable-agnostic-learning/.
- Daniel Kang, Yi Sun, Dan Hendrycks, Tom Brown, and Jacob Steinhardt. Testing robustness against unforeseen adversaries. arXiv preprint arXiv:1908.08016, 2019.
- Michael J. Kearns, Robert E. Schapire, Linda M. Sellie, and Lisa Hellerstein. Toward efficient agnostic learning. In In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pages 341–352, 1992.
- Jyrki Kivinen. Reliable and useful learning with uniform probability distributions. In Proceedings of the First International Workshop on Algorithmic Learning Theory (ALT), pages 209–222, 1990.
- Lihong Li, Michael L Littman, Thomas J Walsh, and Alexander L Strehl. Knows what it knows: a framework for self-aware learning. Machine learning, 82(3):399–443, 2011.
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 20URL https://openreview.net/forum?id=rJzIBfZAb.
- Pascal Massart, Élodie Nédélec, et al. Risk bounds for statistical learning. The Annals of Statistics, 34(5):2326–2366, 2006.
- Tianyu Pang, Chao Du, Yinpeng Dong, and Jun Zhu. Towards robust detection of adversarial examples. In Advances in Neural Information Processing Systems, pages 4579–4589, 2018.
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
- Joaquin Quionero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. Dataset shift in machine learning. 2009.
- Ronald L. Rivest and Robert [H.] Sloan. Learning complicated concepts reliably and usefully (extended abstract). In Tom Mitchell and Reid Smith, editors, Proceedings AAAI-88, pages 635–640. AAAI, 1988.
- Amin Sayedi, Morteza Zadimoghaddam, and Avrim Blum. Trading off mistakes and don’t-know predictions. In Advances in Neural Information Processing Systems, pages 2092–2100, 2010.
- Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge university press, 2014.
- Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1409.1556.
- Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli, et al. Are labels required for improving adversarial robustness? arXiv preprint arXiv:1905.13725, 2019.
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- 2. For rejection, Lemma A.1 states that it is unlikely that there would be any choice of h, = ( 1,..., ) where the resulting (h, ) ∶= { ∈ ∶ h( ) = 1( ) =... = ( )} would contain all training examples but reject (abstain on) many “true” test examples since and are identically distributed. The proof uses Sauer’s lemma.
- 2. For rejection rate, Lemma A.4 uses VC bounds to show that it is unlikely that ▮ ( (h, )) > while ▮ ( (h, )) = 0.

Full Text

Tags

Comments