The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise

NeurIPS 2020, 2020.

Cited by: 0|Bibtex|Views14|Links
Keywords:
computational complexityadversarial examplelow bounddistribution independentExponential Time HypothesisMore(7+)
Weibo:
We studied the computational complexity of adversarially robust learning of halfspaces in the distribution-independent agnostic PAC model

Abstract:

We study the computational complexity of adversarially robust proper learning of halfspaces in the distribution-independent agnostic PAC model, with a focus on $L_p$ perturbations. We give a computationally efficient learning algorithm and a nearly matching computational hardness result for this problem. An interesting implication of ou...More

Code:

Data:

0
Introduction
  • One of the main concrete goals in this context has been to develop classifiers that are robust to adversarial examples, i.e., small imperceptible perturbations to the input that can result in erroneous misclassification [BCM+13, SZS+14, GSS15].
  • This has led to an explosion of research on designing defenses against adversarial examples and attacks on these defenses.
  • The authors study the learnability of halfspaces in this model with respect to
Highlights
  • In recent years, the design of reliable machine learning systems for secure-critical applications, including in computer vision and natural language processing, has been a major goal in the field.

    One of the main concrete goals in this context has been to develop classifiers that are robust to adversarial examples, i.e., small imperceptible perturbations to the input that can result in erroneous misclassification [BCM+13, SZS+14, GSS15]
  • We focus on understanding the computational complexity of adversarially robust classification in the agnostic PAC model [Hau[92], KSS94]
  • We studied the computational complexity of adversarially robust learning of halfspaces in the distribution-independent agnostic PAC model
  • We provided a simple proper learning algorithm for this problem and a nearly matching computational lower bound
  • While proper learners are typically preferable due to their interpretability, the obvious open question is whether significantly faster non-proper learners are possible
Results
  • Label Cover [ABSS97, FGKP06, GR09, FGRW12, DKM19]5.
  • These reductions use gadgets which are “local” in nature.
  • As the authors will explain such “local” reductions cannot work for the purpose.
  • It is convenient to think of each sample (x, y) as a linear constraint w, x ≥ 0 when y = +1 and w, x < 0 when y = −1, where the variables are the coordinates w1, .
  • For the purpose, the authors want (i) the halfspace w to be in Bd1, i.e., |w1| + · · · + |wd| ≤ 1, and (ii) each of the samples x to lie in Bd∞, i.e., |x1|, . . . , |xd| ≤ 1
Conclusion
  • Conclusions and Open Problems

    In this work, the authors studied the computational complexity of adversarially robust learning of halfspaces in the distribution-independent agnostic PAC model.
  • While proper learners are typically preferable due to their interpretability, the obvious open question is whether significantly faster non-proper learners are possible.
  • The authors leave this as an interesting open problem.
  • Another direction for future work is to understand the effect of distributional assumptions on the complexity of the problem and to explore the learnability of simple neural networks in this context.
Summary
  • Introduction:

    One of the main concrete goals in this context has been to develop classifiers that are robust to adversarial examples, i.e., small imperceptible perturbations to the input that can result in erroneous misclassification [BCM+13, SZS+14, GSS15].
  • This has led to an explosion of research on designing defenses against adversarial examples and attacks on these defenses.
  • The authors study the learnability of halfspaces in this model with respect to
  • Objectives:

    For some constants 0 < ν < 1 and α > 1, the goal is to efficiently compute a hypothesis h such that with high probability.
  • Results:

    Label Cover [ABSS97, FGKP06, GR09, FGRW12, DKM19]5.
  • These reductions use gadgets which are “local” in nature.
  • As the authors will explain such “local” reductions cannot work for the purpose.
  • It is convenient to think of each sample (x, y) as a linear constraint w, x ≥ 0 when y = +1 and w, x < 0 when y = −1, where the variables are the coordinates w1, .
  • For the purpose, the authors want (i) the halfspace w to be in Bd1, i.e., |w1| + · · · + |wd| ≤ 1, and (ii) each of the samples x to lie in Bd∞, i.e., |x1|, . . . , |xd| ≤ 1
  • Conclusion:

    Conclusions and Open Problems

    In this work, the authors studied the computational complexity of adversarially robust learning of halfspaces in the distribution-independent agnostic PAC model.
  • While proper learners are typically preferable due to their interpretability, the obvious open question is whether significantly faster non-proper learners are possible.
  • The authors leave this as an interesting open problem.
  • Another direction for future work is to understand the effect of distributional assumptions on the complexity of the problem and to explore the learnability of simple neural networks in this context.
Related work
  • A sequence of recent works [CBM18, SST+18, BLPR19, MHS19] has studied the sample complexity of adversarially robust PAC learning for general concept classes of bounded VC dimension and for halfspaces in particular. [MHS19] established an upper bound on the sample complexity of PAC learning any concept class with finite VC dimension. A common implication of the aforementioned works is that, for some concept classes, the sample complexity of adversarially robust PAC learning is higher than the sample complexity of (standard) PAC learning. For the class of halfspaces, which is the focus of the current paper, the sample complexity of adversarially robust agnostic PAC learning was shown to be essentially the same as that of (standard) agnostic PAC learning [CBM18, MHS19].

    Turning to computational aspects, [BLPR19, DNV19] showed that there exist classification tasks that are efficiently learnable in the standard PAC model, but are computationally hard in the adversarially robust setting (under cryptographic assumptions). Notably, the classification problems shown hard are artificial, in the sense that they do not correspond to natural concept classes. [ADV19] shows that adversarially robust proper learning of degree-2 polynomial threshold functions is computationally hard, even in the realizable setting. On the positive side, [ADV19] gives a polynomial-time algorithm for adversarially robust learning of halfspaces under L∞ perturbations, again in the realizable setting. More recently, [MGDS20] generalized this upper bound to a broad class of perturbations, including Lp perturbations. Moreover, [MGDS20] gave an efficient algorithm for learning halfspaces with random classification noise [AL88]. We note that all these algorithms are proper.
Funding
  • We note that our algorithm has significantly better dependence on the parameter δ (quantifying the approximation ratio), and better dependence on 1/γ
Reference
  • [ABSS97] Sanjeev Arora, László Babai, Jacques Stern, and Z. Sweedyk. The hardness of approximate optima in lattices, codes, and systems of linear equations. J. Comput. Syst. Sci., 54(2):317–331, 1997.
    Google ScholarLocate open access versionFindings
  • Pranjal Awasthi, Abhratanu Dutta, and Aravindan Vijayaraghavan. On robustness to adversarial examples and polynomial optimization. In Advances in Neural Information Processing Systems, pages 13737–13747, 2019.
    Google ScholarLocate open access versionFindings
  • Dana Angluin and Philip Laird. Learning from noisy examples. Mach. Learn., 2(4):343– 370, 1988.
    Google ScholarLocate open access versionFindings
  • Divesh Aggarwal and Noah Stephens-Davidowitz. (Gap/S)ETH hardness of SVP. In STOC, pages 228–238, 2018.
    Google ScholarLocate open access versionFindings
  • Zsolt Baranyai. On the factorization of the complete uniform hypergraph. Infinite and Finite Sets, Proc. Coll. Keszthely, 10:91–107, 1975.
    Google ScholarLocate open access versionFindings
  • Maria-Florina Balcan and Christopher Berlind. A new perspective on learning linear separators with large lqlp margins. In AISTATS, pages 68–76, 2014.
    Google ScholarLocate open access versionFindings
  • [BBE+19] Arnab Bhattacharyya, Édouard Bonnet, László Egri, Suprovat Ghoshal, Karthik C. S., Bingkai Lin, Pasin Manurangsi, and Dániel Marx. Parameterized intractability of even set and shortest vector problem. Electronic Colloquium on Computational Complexity (ECCC), 26:115, 2019.
    Google ScholarLocate open access versionFindings
  • [BCM+13] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In ECML PKDD, pages 387–402, 2013.
    Google ScholarLocate open access versionFindings
  • Andrew C. Berry. The accuracy of the Gaussian approximation to the sum of independent variates. Transactions of the American Mathematical Society, 49(1):122–136, 1941.
    Google ScholarLocate open access versionFindings
  • [BGKM18] Arnab Bhattacharyya, Suprovat Ghoshal, Karthik C. S., and Pasin Manurangsi. Parameterized intractability of even set and shortest vector problem from Gap-ETH. In ICALP, pages 17:1–17:15, 2018.
    Google ScholarLocate open access versionFindings
  • [BGS17] Huck Bennett, Alexander Golovnev, and Noah Stephens-Davidowitz. On the quantitative hardness of CVP. In FOCS, pages 13–24, 2017.
    Google ScholarLocate open access versionFindings
  • [BLPR19] Sebastien Bubeck, Yin-Tat Lee, Eric Price, and Ilya P. Razenshteyn. Adversarial examples from computational constraints. In ICML, pages 831–840, 2019.
    Google ScholarLocate open access versionFindings
  • Peter L. Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002.
    Google ScholarLocate open access versionFindings
  • Shai Ben-David and Hans Ulrich Simon. Efficient learning of linear perceptrons. In Advances in Neural Information Processing Systems, pages 189–195, 2000.
    Google ScholarLocate open access versionFindings
  • Aharon Birnbaum and Shai Shalev-Shwartz. Learning halfspaces with the zero-one loss: Time-accuracy tradeoffs. In Advances in Neural Information Processing Systems, pages 935–943, 2012.
    Google ScholarLocate open access versionFindings
  • [CBM18] Daniel Cullina, Arjun Nitin Bhagoji, and Prateek Mittal. Pac-learning in the presence of adversaries. In Advances in Neural Information Processing Systems, pages 228–239, 2018.
    Google ScholarLocate open access versionFindings
  • [CCK+17] Parinya Chalermsook, Marek Cygan, Guy Kortsarz, Bundit Laekhanukit, Pasin Manurangsi, Danupon Nanongkai, and Luca Trevisan. From Gap-ETH to FPTinapproximability: Clique, dominating set, and more. In FOCS, pages 743–754, 2017.
    Google ScholarLocate open access versionFindings
  • [CGK+19] Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight FPT approximations for k-median and k-means. In ICALP, pages 42:1–42:14, 2019.
    Google ScholarLocate open access versionFindings
  • Yijia Chen and Bingkai Lin. The constant inapproximability of the parameterized dominating set problem. SIAM J. Comput., 48(2):513–533, 2019.
    Google ScholarLocate open access versionFindings
  • Zico Colter and Aleksander Madry. Adversarial robustness - theory and practice. NeurIPS 2018 tutorial, available at https://adversarial-ml-tutorial.org/, 2018.
    Findings
  • Irit Dinur. Mildly exponential reduction from gap 3SAT to polynomial-gap label-cover. Electronic Colloquium on Computational Complexity (ECCC), 23:128, 2016.
    Google ScholarLocate open access versionFindings
  • [DKM19] Ilias Diakonikolas, Daniel Kane, and Pasin Manurangsi. Nearly tight bounds for robust proper learning of halfspaces with a margin. In Advances in Neural Information Processing Systems, pages 10473–10484, 2019.
    Google ScholarLocate open access versionFindings
  • [DM18] Irit Dinur and Pasin Manurangsi. ETH-hardness of approximating 2-CSPs and directed steiner network. In ITCS, pages 36:1–36:20, 2018.
    Google ScholarLocate open access versionFindings
  • [DNV19] Akshay Degwekar, Preetum Nakkiran, and Vinod Vaikuntanathan. Computational limitations in robust classification and win-win results. In COLT, pages 994–1028, 2019.
    Google ScholarLocate open access versionFindings
  • Irit Dinur and David Steurer. Analytical approach to parallel repetition. In STOC, pages 624–633, 2014.
    Google ScholarLocate open access versionFindings
  • Carl-Gustav Esseen. On the Liapunoff limit of error in the theory of probability. Arkiv för matematik, astronomi och fysik, A28:1–19, 1942.
    Google ScholarLocate open access versionFindings
  • Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634–652, 1998.
    Google ScholarLocate open access versionFindings
  • [FGKP06] Vitaly Feldman, Parikshit Gopalan, Subhash Khot, and Ashok Kumar Ponnuswami. New results for learning noisy parities and halfspaces. In FOCS, pages 563–574, 2006.
    Google ScholarLocate open access versionFindings
  • [FGRW12] Vitaly Feldman, Venkatesan Guruswami, Prasad Raghavendra, and Yi Wu. Agnostic learning of monomials by halfspaces is hard. SIAM J. Comput., 41(6):1558–1590, 2012.
    Google ScholarLocate open access versionFindings
  • Yoav Freund and Robert Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139, 1997.
    Google ScholarLocate open access versionFindings
  • [Gen01a] Claudio Gentile. A new approximate maximal margin classification algorithm. J. Mach. Learn. Res., 2:213–242, 2001.
    Google ScholarLocate open access versionFindings
  • [Gen01b] Claudio Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213–242, 2001.
    Google ScholarLocate open access versionFindings
  • [Gen03] Claudio Gentile. The robustness of the p-norm algorithms. Mach. Learn., 53(3):265–299, 2003.
    Google ScholarLocate open access versionFindings
  • [GLS01] Adam J. Grove, Nick Littlestone, and Dale Schuurmans. General convergence results for linear discriminant updates. Mach. Learn., 43(3):173–210, 2001.
    Google ScholarLocate open access versionFindings
  • Venkatesan Guruswami and Prasad Raghavendra. Hardness of learning halfspaces with noise. SIAM J. Comput., 39(2):742–765, 2009.
    Google ScholarLocate open access versionFindings
  • [GSS15] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Johan Håstad. Clique is hard to approximate within n1−ǫ. In FOCS, pages 627–636, 1996.
    Google ScholarLocate open access versionFindings
  • [Hås01] Johan Håstad. Some optimal inapproximability results. J. ACM, 48(4):798–859, 2001.
    Google ScholarLocate open access versionFindings
  • [Hau92] David Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100:78–150, 1992.
    Google ScholarLocate open access versionFindings
  • Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. J. Comput. Syst. Sci., 62(2):367–375, 2001.
    Google ScholarLocate open access versionFindings
  • Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Comput. Syst. Sci., 63(4):512–530, 2001.
    Google ScholarLocate open access versionFindings
  • Vishesh Jain, Frederic Koehler, and Andrej Risteski. Mean-field approximation, convex hierarchies, and the optimality of correlation rounding: a unified perspective. In STOC, pages 1226–1236, 2019.
    Google ScholarLocate open access versionFindings
  • [KLM19] Karthik C. S., Bundit Laekhanukit, and Pasin Manurangsi. On the parameterized complexity of approximating dominating set. J. ACM, 66(5):33:1–33:38, 2019.
    Google ScholarLocate open access versionFindings
  • Vladimir Koltchinskii and Dmitry Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist., 30(1):1–50, 02 2002.
    Google ScholarLocate open access versionFindings
  • [KSS94] Michael Kearns, Robert Schapire, and Linda Sellie. Toward Efficient Agnostic Learning. Machine Learning, 17(2/3):115–141, 1994.
    Google ScholarLocate open access versionFindings
  • Sham M. Kakade, Karthik Sridharan, and Ambuj Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In Advances in Neural Information Processing Systems, pages 793–800, 2008.
    Google ScholarLocate open access versionFindings
  • Bingkai Lin. A simple gap-producing reduction for the parameterized set cover problem. In ICALP, pages 81:1–81:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
    Google ScholarLocate open access versionFindings
  • Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm. Machine Learning, 2(4):285–318, 1987.
    Google ScholarLocate open access versionFindings
  • [LMS11] Daniel Lokshtanov, Dániel Marx, and Saket Saurabh. Lower bounds based on the exponential time hypothesis. Bulletin of the EATCS, 105:41–72, 2011.
    Google ScholarLocate open access versionFindings
  • Phil Long and Rocco Servedio. Learning large-margin halfspaces with more malicious noise. Advances in Neural Information Processing Systems, 2011.
    Google ScholarLocate open access versionFindings
  • Pasin Manurangsi. Tight running time lower bounds for strong inapproximability of maximum k -coverage, unique set cover and related problems (via t-wise agreement testing theorem). In SODA, pages 62–81, 2020.
    Google ScholarLocate open access versionFindings
  • [Mar13] Dániel Marx. Completely inapproximable monotone and antimonotone parameterized problems. J. Comput. Syst. Sci., 79(1):144–151, 2013.
    Google ScholarLocate open access versionFindings
  • [MGDS20] Omar Montasser, Surbhi Goel, Ilias Diakonikolas, and Nathan Srebro. Efficiently learning adversarially robust halfspaces with noise. CoRR, abs/2005.07652, 2020.
    Findings
  • [MHS19] Omar Montasser, Steve Hanneke, and Nathan Srebro. VC classes are adversarially robustly learnable, but only improperly. In COLT, pages 2512–2530, 2019.
    Google ScholarLocate open access versionFindings
  • Dana Moshkovitz and Ran Raz. Two-query PCP with subconstant error. J. ACM, 57(5):29:1–29:29, 2010.
    Google ScholarLocate open access versionFindings
  • Pasin Manurangsi and Prasad Raghavendra. A birthday repetition theorem and complexity of approximating dense CSPs. In ICALP, pages 78:1–78:15, 2017.
    Google ScholarLocate open access versionFindings
  • [Raz98] Ran Raz. A parallel repetition theorem. SIAM J. Comput., 27(3):763–803, 1998.
    Google ScholarLocate open access versionFindings
  • Frank Rosenblatt. The Perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958.
    Google ScholarLocate open access versionFindings
  • Shai Shalev-Shwartz, Ohad Shamir, and Karthik Sridharan. Agnostically learning halfspaces with margin errors. In Technical report, Toyota Technological Institute, 2009.
    Google ScholarLocate open access versionFindings
  • Shai Shalev-Shwartz, Ohad Shamir, and Karthik Sridharan. Learning kernel-based halfspaces with the zero-one loss. In COLT, pages 441–450, 2010.
    Google ScholarLocate open access versionFindings
  • [SST+18] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, pages 5019–5031, 2018.
    Google ScholarLocate open access versionFindings
  • [SZS+14] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • [Vap98] Vladimir Vapnik. Statistical Learning Theory. Wiley-Interscience, New York, 1998.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments