Efficiently Learning Adversarially Robust Halfspaces with Noise

ICML, pp. 7010-7021, 2020.

Cited by: 3|Bibtex|Views46
EI
Other Links: arxiv.org|academic.microsoft.com|dblp.uni-trier.de
Weibo:
We provide necessary and sufficient conditions for perturbation sets U, under which we can efficiently solve the robust empirical risk minimization problem

Abstract:

We study the problem of learning adversarially robust halfspaces in the distribution-independent setting. In the realizable setting, we provide necessary and sufficient conditions on the adversarial perturbation sets under which halfspaces are efficiently robustly learnable. In the presence of random label noise, we give a simple comput...More

Code:

Data:

Introduction
  • Learning predictors that are robust to adversarial examples remains a major challenge in machine learning.
  • A line of work has shown that predictors learned by deep neural networks are not robust to adversarial examples [GSS15, BCM+13, GSS15].
  • This has led to a long line of research studying different aspects of robustness to adversarial examples.
  • For an unknown distribution D over X × Y, the authors observe m i.i.d. samples S ∼ Dm, and the goal is to learn a predictor h : X → Y that achieves small robust risk, RU (h; D)
Highlights
  • Learning predictors that are robust to adversarial examples remains a major challenge in machine learning
  • We consider the problem of distribution-independent learning of halfspaces that are robust to adversarial examples at test time, referred to as robust PAC learning of halfspaces
  • We provide necessary and sufficient conditions on perturbation sets U, under which the robust empirical risk minimization (RERM) problem is efficiently solvable in the realizable setting
  • We show that an efficient separation oracle for U yields an efficient solver for RERMU, while an efficient approximate separation oracle for U is necessary for even computing the robust loss supz∈U(x) 1⁄2[hw(z) = y] of a halfspace hw
  • In Theorem 3.5, we show that an efficient separation oracle for U yields an efficient solver for RERMU
  • We provide necessary and sufficient conditions for perturbation sets U, under which we can efficiently solve the robust empirical risk minimization (RERM) problem
Conclusion
  • The authors provide necessary and sufficient conditions for perturbation sets U , under which the authors can efficiently solve the robust empirical risk minimization (RERM) problem.
  • The authors give a polynomial time algorithm to solve RERM given access to a polynomial time separation oracle for U.
  • The authors show that an efficient approximate separation oracle for U is necessary for even computing the robust loss of a halfspace.
  • An interesting direction for future work is to understand the computational complexity of robustly PAC learning halfspaces under stronger noise models, including Massart noise and agnostic noise
Summary
  • Introduction:

    Learning predictors that are robust to adversarial examples remains a major challenge in machine learning.
  • A line of work has shown that predictors learned by deep neural networks are not robust to adversarial examples [GSS15, BCM+13, GSS15].
  • This has led to a long line of research studying different aspects of robustness to adversarial examples.
  • For an unknown distribution D over X × Y, the authors observe m i.i.d. samples S ∼ Dm, and the goal is to learn a predictor h : X → Y that achieves small robust risk, RU (h; D)
  • Conclusion:

    The authors provide necessary and sufficient conditions for perturbation sets U , under which the authors can efficiently solve the robust empirical risk minimization (RERM) problem.
  • The authors give a polynomial time algorithm to solve RERM given access to a polynomial time separation oracle for U.
  • The authors show that an efficient approximate separation oracle for U is necessary for even computing the robust loss of a halfspace.
  • An interesting direction for future work is to understand the computational complexity of robustly PAC learning halfspaces under stronger noise models, including Massart noise and agnostic noise
Related work
  • Here we focus on the recent work that is most closely related to the results of this paper. [ADV19] studied the tractability of RERM with respect to l∞ perturbations, obtaining efficient algorithms for halfspaces in the realizable setting, but showing that RERM for degree-2 polynomial threshold functions is computationally intractable (assuming NP = RP). [GKKW19] studied robust learnability of hypothesis classes defined over {0, 1}n with respect to hamming distance, and showed that monotone conjunctions are robustly learnable when the adversary can perturb only O(log n) bits, but are not robustly learnable even under the uniform distribution when the adversary can flip ω(log n) bits.

    In this work, we take a more general approach, and instead of considering specific perturbation sets, we provide methods in terms of oracle access to a separation oracle for the perturbation set U , and aim to characterize which perturbation sets U admit tractable RERM.

    In the non-realizable setting, the only prior work we are aware of is by [DKM19] who studied the complexity of robustly learning halfspaces in the agnostic setting under l2 perturbations.

    Let X = Rd be the instance space and Y = {±1} be the label space. We consider halfspaces H = {x → sign( w, x ) : w ∈ Rd}.

    The following definitions formalize the notion of adversarially robust PAC learning in the realizable and random classification noise settings: Definition 2.1 (Realizable Robust PAC Learning). We say H is robustly PAC learnable with respect to an adversary U in the realizable setting, if there exists a learning algorithm A : (X ×Y)∗ → YX with sample complexity m : (0, 1) → N such that: for any ǫ, δ ∈ (0, 1), for every data distribution D over X × Y where there exists a predictor h∗ ∈ H with zero robust risk, RU (h∗; D) = 0, with probability at least 1 − δ over S ∼ Dm, RU (A(S); D) ≤ ǫ.
Funding
  • Work by N.S. and O.M. was partially supported by NSF award IIS-1546500 and by DARPA1 cooperative agreement HR00112020003
  • I.D. was supported by NSF Award CCF1652862 (CAREER), a Sloan Research Fellowship, and a DARPA Learning with Less Labels (LwLL) grant
  • S.G. was supported by the JP Morgan AI Research PhD Fellowship
Reference
  • Pranjal Awasthi, Abhratanu Dutta, and Aravindan Vijayaraghavan. On robustness to adversarial examples and polynomial optimization. In Advances in Neural Information Processing Systems, pages 13737–13747, 2019.
    Google ScholarLocate open access versionFindings
  • Dana Angluin and Philip D. Laird. Learning from noisy examples. Mach. Learn., 2(4):343–370, 1987. doi:10.1007/BF00116829.
    Locate open access versionFindings
  • Sebastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trends R in Machine Learning, 8(3-4):231–357, 2015.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan, Avrim Blum, and Nathan Srebro. A theory of learning with similarity functions. Machine Learning, 72(1-2):89–112, 2008. doi:10.1007/s10994-008-5059-5.
    Locate open access versionFindings
  • [BLPR19] Sebastien Bubeck, Yin Tat Lee, Eric Price, and Ilya Razenshteyn. Adversarial examples from computational constraints. In International Conference on Machine Learning, pages 831–840, 2019.
    Google ScholarLocate open access versionFindings
  • Tom Bylander. Learning linear threshold functions in the presence of classification noise. In Manfred K. Warmuth, editor, Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, COLT 1994, New Brunswick, NJ, USA, July 12-15, 1994, pages 340–347. ACM, 1994. doi:10.1145/180139.181176.
    Locate open access versionFindings
  • Daniel Cullina, Arjun Nitin Bhagoji, and Prateek Mittal. Pac-learning in the presence of adversaries. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 230–241. Curran Associates, Inc., 2018. URL: http://papers.nips.cc/paper/7307-pac-learning-in-the-presence-of-adversaries.pdf.
    Locate open access versionFindings
  • Alon Cohen. Surrogate Loss Minimization. PhD thesis, Hebrew University of Jerusalem, 2014.
    Google ScholarFindings
  • Amit Daniely. Complexity theoretic limitations on learning halfspaces. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 105–117, 2016.
    Google ScholarLocate open access versionFindings
  • Ilias Diakonikolas, Themis Gouleakis, and Christos Tzamos. Distribution-independent pac learning of halfspaces with massart noise. In Advances in Neural Information Processing Systems, pages 4751–4762, 2019.
    Google ScholarLocate open access versionFindings
  • Ilias Diakonikolas, Daniel Kane, and Pasin Manurangsi. Nearly tight bounds for robust proper learning of halfspaces with a margin. In Advances in Neural Information Processing Systems, pages 10473–10484, 2019.
    Google ScholarLocate open access versionFindings
  • Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. Exploring the landscape of spatial robustness. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1802–1811, Long Beach, California, USA, 09–15 Jun 2019. PMLR. URL: http://proceedings.mlr.press/v97/engstrom19a.html.
    Locate open access versionFindings
  • Vitaly Feldman, Parikshit Gopalan, Subhash Khot, and Ashok Kumar Ponnuswami. New results for learning noisy parities and halfspaces. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 563–574. IEEE, 2006.
    Google ScholarLocate open access versionFindings
  • Vitaly Feldman, Cristobal Guzman, and Santosh S. Vempala. Statistical query algorithms for mean vector estimation and stochastic convex optimization. In Philip N. Klein, editor, Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 1265–1277. SIAM, 2017. doi:10.1137/1.9781611974782.82.
    Locate open access versionFindings
  • [GKKW19] Pascale Gourdeau, Varun Kanade, Marta Kwiatkowska, and James Worrell. On the hardness of robust classification. In Advances in Neural Information Processing Systems, pages 7444–7453, 2019.
    Google ScholarLocate open access versionFindings
  • Venkatesan Guruswami and Prasad Raghavendra. Hardness of learning halfspaces with noise. SIAM Journal on Computing, 39(2):742–765, 2009.
    Google ScholarLocate open access versionFindings
  • Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL: http://arxiv.org/abs/1412.6572.
    Findings
  • Varun Kanade. Computational learning theory notes - 8: Learning real-valued functions, 2018.
    Google ScholarFindings
  • Justin Khim and Po-Ling Loh. Adversarial risk bounds for binary classification via function transformation. arXiv preprint arXiv:1810.09519, 2018.
    Findings
  • [KSH+19] Daniel Kang, Yi Sun, Dan Hendrycks, Tom Brown, and Jacob Steinhardt. Testing robustness against unforeseen adversaries. CoRR, abs/1908.08016, 2019. URL: http://arxiv.org/abs/1908.08016, arXiv:1908.08016.
    Findings
  • [LSV18] Yin Tat Lee, Aaron Sidford, and Santosh S Vempala. Efficient convex optimization with membership oracles. In Conference On Learning Theory, pages 1292–1294, 2018.
    Google ScholarLocate open access versionFindings
  • Omar Montasser, Steve Hanneke, and Nathan Srebro. Vc classes are adversarially robustly learnable, but only improperly. In Alina Beygelzimer and Daniel Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Pro-
    Google ScholarLocate open access versionFindings
  • ceedings of Machine Learning Research, pages 2512–2530, Phoenix, USA, 25–28 Jun 2019. PMLR.
    Google ScholarFindings
  • Wolfgang Maass and Gyorgy Turan. How fast can a threshold gate learn? In Proceedings of a workshop on Computational learning theory and natural learning systems (vol. 1): constraints and prospects: constraints and prospects, pages 381–414, 1994.
    Google ScholarLocate open access versionFindings
  • Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386, 1958.
    Google ScholarLocate open access versionFindings
  • Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolo CesaBianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montreal, Canada, pages 5019–5031, 2018. URL: http://papers.nips.cc/paper/7749-adversarially-robust-generalization-requires-more-data.
    Locate open access versionFindings
  • V. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York, 1982.
    Google ScholarFindings
  • Dong Yin, Kannan Ramchandran, and Peter L. Bartlett. Rademacher complexity for adversarially robust generalization. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 7085–7094. PMLR, 2019. URL: http://proceedings.mlr.press/v97/yin19b.html.
    Locate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments