AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Experimentation with benchmark datasets shows the reliability and tightness of the presented performance bounds, and the competitive classification performance of minimax risk classifiers with simple feature mappings given by thresholds

Minimax classification with 0-1 loss and performance guarantees

NIPS 2020, (2020)

Cited by: 0|Views19
EI
Full Text
Bibtex
Weibo

Abstract

Supervised classification techniques use training samples to find classification rules with small expected 0-1 loss. Conventional methods achieve efficient learning and out-of-sample generalization by minimizing surrogate losses over specific families of rules. This paper presents minimax risk classifiers (MRCs) that do not rely on a ch...More
0
Introduction
  • Supervised classification techniques use training samples to find classification rules that assign labels to instances with small expected 0-1 loss, referred to as risk or probability of error.
  • Most learning methods utilize empirical risk minimization (ERM) approach that minimizes the expectation w.r.t. the empirical distribution of training samples, see e.g., [1, 2].
  • RRM techniques can directly achieve out-of-sample generalization by using uncertainty sets that include the true underlying distribution.
  • Such uncertainty sets can enable to obtain tight performance bounds at learning.
  • ERM-based techniques such as support vector machines (SVMs), multilayer perceptrons (MLPs), and Adaboost classifiers consider
Highlights
  • Supervised classification techniques use training samples to find classification rules that assign labels to instances with small expected 0-1 loss, referred to as risk or probability of error
  • This paper presents robust risk minimization (RRM)-based classification techniques referred to as minimax risk classifiers (MRCs) that minimize worst-case expected 0-1 loss over general classification rules, and provide tight performance bounds at learning
  • We first present techniques that provide tight performance bounds at learning, and we show finite-sample generalization bounds for MRCs’ risk in terms of training size and smallest minimax risk
  • Experimentation with benchmark datasets shows the reliability and tightness of the presented performance bounds, and the competitive classification performance of MRCs with simple feature mappings given by thresholds
  • The results presented show that supervised classification does not require to choose a surrogate loss that substitutes original 0-1 loss, and a specific family that constraints classification rules
  • Learning with MRCs is achieved without further design choices by solving linear optimization problems that can provide tight performance guarantees
Results
  • The authors show numerical results for MRCs using 8 UCI datasets for multi-class classification.
  • MRCs’ results are obtained using feature mappings given by instances’ thresholding, to those used by maximum entropy and logistic regression methods [13,16,17].
  • Such feature mappings are adequate for a streamlined implementation of MRCs because they take a reduced number of values.1.
Conclusion
  • The proposed MRCs minimize the worst-case expected 0-1 loss over general classification rules, and provide performance guarantees at learning.
  • Experimentation with benchmark datasets shows the reliability and tightness of the presented performance bounds, and the competitive classification performance of MRCs with simple feature mappings given by thresholds.
  • From conventional techniques, the inductive bias exploited by MRCs comes only from a feature mapping that serves to constrain the distributions considered.
  • Learning with MRCs is achieved without further design choices by solving linear optimization problems that can provide tight performance guarantees
Summary
  • Introduction:

    Supervised classification techniques use training samples to find classification rules that assign labels to instances with small expected 0-1 loss, referred to as risk or probability of error.
  • Most learning methods utilize empirical risk minimization (ERM) approach that minimizes the expectation w.r.t. the empirical distribution of training samples, see e.g., [1, 2].
  • RRM techniques can directly achieve out-of-sample generalization by using uncertainty sets that include the true underlying distribution.
  • Such uncertainty sets can enable to obtain tight performance bounds at learning.
  • ERM-based techniques such as support vector machines (SVMs), multilayer perceptrons (MLPs), and Adaboost classifiers consider
  • Results:

    The authors show numerical results for MRCs using 8 UCI datasets for multi-class classification.
  • MRCs’ results are obtained using feature mappings given by instances’ thresholding, to those used by maximum entropy and logistic regression methods [13,16,17].
  • Such feature mappings are adequate for a streamlined implementation of MRCs because they take a reduced number of values.1.
  • Conclusion:

    The proposed MRCs minimize the worst-case expected 0-1 loss over general classification rules, and provide performance guarantees at learning.
  • Experimentation with benchmark datasets shows the reliability and tightness of the presented performance bounds, and the competitive classification performance of MRCs with simple feature mappings given by thresholds.
  • From conventional techniques, the inductive bias exploited by MRCs comes only from a feature mapping that serves to constrain the distributions considered.
  • Learning with MRCs is achieved without further design choices by solving linear optimization problems that can provide tight performance guarantees
Tables
  • Table1: Classification error and performance bounds of MRC in comparison with state-of-the-art techniques
Download tables as Excel
Funding
  • Acknowledgments and Disclosure of Funding Funding in direct support of this work has been provided by the Spanish Ministry of Economy and Competitiveness MINECO through Ramon y Cajal Grant RYC-2016-19383, BCAM’s Severo Ochoa Excellence Accreditation SEV-2017-0718, Project PID2019-105058GA-I00, and Project TIN201782626-R, and by the Basque Government through the ELKARTEK and BERC 2018-2021 programmes
Study subjects and analysis
UCI datasets: 8
5 Experimental results. In this section we show numerical results for MRCs using 8 UCI datasets for multi-class classification. The first set of results shows the suitability of the upper and lower bounds Ra,b and La,b for MRCs with varying training sizes, while the second set of results compares the classification error of MRCs w.r.t. state-of-the-art techniques

data sets: 6
It can be observed from the Figures 1(a) and 1(b) that the lower and upper bounds obtained at learning can offer accurate estimates for the risk without using test samples. In the second set of experimental results, we use 6 data sets from the UCI repository (first column of Table 1). MRCs are compared with 7 classifiers: decision tree (DT), quadratic discriminant analysis (QDA), k-nearest neighbor (KNN), Gaussian kernel SVM, and random forest (RF), as well as the related RRM classifiers adversarial multiclass classifier (AMC), and maximum entropy machine (MEM)

Reference
  • Vladimir Vapnik. Statistical learning theory. Wiley, New York, 1998.
    Google ScholarFindings
  • Theodoros Evgeniou, Massimiliano Pontil, and Tomaso Poggio. Regularization networks and support vector machines. Advances in computational mathematics, 13(1):1–50, 2000.
    Google ScholarLocate open access versionFindings
  • Gert R.G. Lanckriet, Laurent El Ghaoui, Chiranjib Bhattacharyya, and Michael I. Jordan. A robust minimax approach to classification. Journal of Machine Learning Research, 3:555–582, December 2002.
    Google ScholarLocate open access versionFindings
  • Jaeho Lee and Maxim Raginsky. Minimax statistical learning with Wasserstein distances. In Advances in Neural Information Processing Systems, pages 2692–2701, 2018.
    Google ScholarLocate open access versionFindings
  • Farzan Farnia and David Tse. A minimax approach to supervised learning. In Advances in Neural Information Processing Systems, pages 4240–4248, 2016.
    Google ScholarLocate open access versionFindings
  • Kaiser Asif, Wei Xing, Sima Behpour, and Brian D. Ziebart. Adversarial cost-sensitive classification. In Conference on Uncertainty in Artificial Intelligence, pages 92–101, 2015.
    Google ScholarLocate open access versionFindings
  • Rizal Fathony, Anqi Liu, Kaiser Asif, and Brian D. Ziebart. Adversarial multiclass classification: A risk minimization perspective. In Advances in Neural Information Processing Systems 29, pages 559–567, 2016.
    Google ScholarLocate open access versionFindings
  • John Duchi, Peter Glynn, and Hongseok Namkoong. Statistics of robust optimization: A generalized empirical likelihood approach. arXiv preprint, arXiv:1610.03425, 2016.
    Findings
  • Hongseok Namkoong and John C. Duchi. Variance-based regularization with convex objectives. In Advances in Neural Information Processing Systems, pages 2971–2980, 2017.
    Google ScholarLocate open access versionFindings
  • Erick Delage and Yinyu Ye. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations Research, 58(3):595–612, 2010.
    Google ScholarLocate open access versionFindings
  • Soroosh Shafieezadeh-Abadeh, Peyman Mohajerin Esfahani, and Daniel Kuhn. Distributionally robust logistic regression. In Advances in Neural Information Processing Systems, pages 1576–1584, 2015.
    Google ScholarLocate open access versionFindings
  • Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, and Peyman Mohajerin Esfahani. Regularization via mass transportation. Journal of Machine Learning Research, 20(103):1–68, 2019.
    Google ScholarLocate open access versionFindings
  • Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, Cambridge, MA, second edition, 2018.
    Google ScholarFindings
  • Peter D. Grünwald and A. Philip Dawid. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. The Annals of Statistics, 32(4):1367–1433, 2004.
    Google ScholarLocate open access versionFindings
  • Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, New York, NY, USA, 2004.
    Google ScholarFindings
  • Miroslav Dudík, Steven J. Phillips, and Robert E. Schapire. Performance guarantees for regularized maximum entropy density estimation. In Proceedings of the 17th annual conference on computational learning theory, pages 472–486, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.
    Google ScholarLocate open access versionFindings
  • Steven J. Phillips, Robert P. Anderson, and Robert E. Schapire. Maximum entropy modeling of species geographic distributions. Ecological modelling, 190(3):231–259, January 2006.
    Google ScholarLocate open access versionFindings
  • Michael Grant, Stephen Boyd, and Yinyu Ye. Disciplined convex programming. In L. Liberti and N. Maculan, editors, Global Optimization: From Theory to Implementation, Nonconvex Optimization and its Applications, pages 155–210.
    Google ScholarLocate open access versionFindings
  • Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press, 2013.
    Google ScholarFindings
Author
Santiago Mazuelas
Santiago Mazuelas
Andrea Zanoni
Andrea Zanoni
Aritz Pérez
Aritz Pérez
Your rating :
0

 

Tags
Comments
小科