AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We developed two new and highly efficient epigraphical projection-based incremental algorithms to solve the Wasserstein distributionally robust support vector machine problem with p norm-induced transport cost and established their convergence rates

Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine

NIPS 2020, (2020)

Cited by: 0|Views109
EI
Full Text
Bibtex
Weibo

Abstract

Wasserstein \textbf{D}istributionally \textbf{R}obust \textbf{O}ptimization (DRO) is concerned with finding decisions that perform well on data that are drawn from the worst-case probability distribution within a Wasserstein ball centered at a certain nominal distribution. In recent years, it has been shown that various DRO formulations...More

Code:

Data:

0
Introduction
  • Wasserstein distance-based distributionally robust optimization (DRO) has recently received significant attention in the machine learning community.
  • This can be attributed to its ability to improve generalization performance by robustifying the learning model against unseen data [13, 22].
  • A standard approach to solving these reformulations is to use off-the-shelf solvers such as YALMIP or CPLEX
  • These general-purpose solvers do not scale well with the problem size.
Highlights
  • Wasserstein distance-based distributionally robust optimization (DRO) has recently received significant attention in the machine learning community
  • We propose two new epigraphical projection-based incremental algorithms for solving the p-distributionally robust support vector machine (DRSVM) problem (1), which tackle the variables (w, λ) jointly
  • We evaluate our two proposed incremental methods incremental projected subgradient descent (ISG) and incremental proximal point algorithm (IPPA) in different settings to corroborate our theoretical results in Section 4 and to better understand their empirical strengths and weaknesses
  • We develop a hybrid algorithm that combines the advantages of both ISG and IPPA to further speed up the convergence in practice
  • We only extend the first-order algorithmic framework to tackle the ∞-DRSVM problem
  • We developed two new and highly efficient epigraphical projection-based incremental algorithms to solve the Wasserstein DRSVM problem with p norm-induced transport cost (p ∈ {1, 2, ∞}) and established their convergence rates
Results
  • The authors present numerical results to demonstrate the efficiency of the proposed incremental methods.
  • The authors evaluate the two proposed incremental methods ISG and IPPA in different settings to corroborate the theoretical results in Section 4 and to better understand their empirical strengths and weaknesses.
  • The faster inner solver conjugate gradient with an active set method can only tackle the ∞ case in [15].
  • The implementation details to reproduce all numerical results are given in the Appendix.
  • The authors' code is available at https://github.com/gerrili1996/ Incremental_DRSVM
Conclusion
  • The authors developed two new and highly efficient epigraphical projection-based incremental algorithms to solve the Wasserstein DRSVM problem with p norm-induced transport cost (p ∈ {1, 2, ∞}) and established their convergence rates.
  • A natural future direction is to develop a minibatch version of IPPA and extend the algorithms to the asynchronous decentralized parallel setting.
  • It would be interesting to develop some new incremental/stochastic algorithms to tackle more general Wasserstein DRO problems; see, e.g., problem (11) in [19]
Summary
  • Introduction:

    Wasserstein distance-based distributionally robust optimization (DRO) has recently received significant attention in the machine learning community.
  • This can be attributed to its ability to improve generalization performance by robustifying the learning model against unseen data [13, 22].
  • A standard approach to solving these reformulations is to use off-the-shelf solvers such as YALMIP or CPLEX
  • These general-purpose solvers do not scale well with the problem size.
  • Results:

    The authors present numerical results to demonstrate the efficiency of the proposed incremental methods.
  • The authors evaluate the two proposed incremental methods ISG and IPPA in different settings to corroborate the theoretical results in Section 4 and to better understand their empirical strengths and weaknesses.
  • The faster inner solver conjugate gradient with an active set method can only tackle the ∞ case in [15].
  • The implementation details to reproduce all numerical results are given in the Appendix.
  • The authors' code is available at https://github.com/gerrili1996/ Incremental_DRSVM
  • Conclusion:

    The authors developed two new and highly efficient epigraphical projection-based incremental algorithms to solve the Wasserstein DRSVM problem with p norm-induced transport cost (p ∈ {1, 2, ∞}) and established their convergence rates.
  • A natural future direction is to develop a minibatch version of IPPA and extend the algorithms to the asynchronous decentralized parallel setting.
  • It would be interesting to develop some new incremental/stochastic algorithms to tackle more general Wasserstein DRO problems; see, e.g., problem (11) in [19]
Tables
  • Table1: Convergence rates of incremental algorithms for p-DRSVM
  • Table2: Wall-clock Time Comparison on UCI Real Dataset: 1-DRSVM, c = 0, κ = 1, = 0.1
  • Table3: Wall-clock Time Comparison on UCI Real Dataset: 2-DRSVM, c = 0, κ = 1, = 0.1
  • Table4: Wall-clock Time Comparison on UCI Real Dataset: ∞-DRSVM, c = 0, κ = 1, = 0.1
  • Table5: Summary of all ingredients in ISG and IPPA
  • Table6: Summary of all sub-cases for 2 proximal point update (7)
  • Table7: Wall-clock Time Comparison on UCI Real Dataset: 1-DRSVM, c = 1, κ = 1, = 0.1
  • Table8: Wall-clock Time Comparison on UCI Real Dataset: ∞-DRSVM, c = 1, κ = 1, = 0.1
Download tables as Excel
Funding
  • Caihua Chen is supported in part by the National Natural Science Foundation of China (NSFC) projects 71732003, 11871269 and in part by the Natural Science Foundation of Jiangsu Province project BK20181259
  • Anthony Man-Cho So is supported in part by the CUHK Research Sustainability of Major RGC Funding Schemes project 3133236. Broader Impact This work does not present any foreseeable societal consequence
Reference
  • Heinz H. Bauschke. Projection Algorithms and Monotone Operators. PhD thesis, Simon Fraser University, 1996.
    Google ScholarFindings
  • Heinz H Bauschke and Jonathan M Borwein. On projection algorithms for solving convex feasibility problems. SIAM Review, 38(3):367–426, 1996.
    Google ScholarLocate open access versionFindings
  • Dimitri P Bertsekas. Incremental proximal methods for large scale convex optimization. Mathematical Programming, 129(2):163–195, 2011.
    Google ScholarLocate open access versionFindings
  • Jose Blanchet, Yang Kang, and Karthyek Murthy. Robust Wasserstein profile inference and applications to machine learning. Journal of Applied Probability, 56(3):830–857, 2019.
    Google ScholarLocate open access versionFindings
  • Jose Blanchet, Karthyek Murthy, and Fan Zhang. Optimal transport based distributionally robust optimization: Structural properties and iterative schemes. arXiv preprint arXiv:1810.02403, 2018.
    Findings
  • Jérôme Bolte, Trong Phong Nguyen, Juan Peypouquet, and Bruce W Suter. From error bounds to the complexity of first-order descent methods for convex functions. Mathematical Programming, 165(2):471–507, 2017.
    Google ScholarLocate open access versionFindings
  • J. Frédéric Bonnans and Alexander Shapiro. Perturbation Analysis of Optimization Problems. Springer Series in Operations Research. Springer–Verlag, New York, 2000.
    Google ScholarFindings
  • James V Burke and Michael C Ferris. Weak sharp minima in mathematical programming. SIAM Journal on Control and Optimization, 31(5):1340–1359, 1993.
    Google ScholarLocate open access versionFindings
  • Yu-Hong Dai and Roger Fletcher. New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds. Mathematical Programming, 106(3):403–421, 2006.
    Google ScholarLocate open access versionFindings
  • Rui Gao, Xi Chen, and Anton J Kleywegt. Wasserstein distributional robustness and regularization in statistical learning. arXiv preprint arXiv:1712.06050, 2017.
    Findings
  • Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh ShafieezadehAbadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations Research & Management Science in the Age of Analytics, pages 130–166. INFORMS, 2019.
    Google ScholarLocate open access versionFindings
  • Changhyeok Lee and Sanjay Mehrotra. A distributionally-robust approach for finding support vector machines. Optimization Online, 2015.
    Google ScholarLocate open access versionFindings
  • Jaeho Lee and Maxim Raginsky. Minimax statistical learning with Wasserstein distances. In Advances in Neural Information Processing Systems, pages 2687–2696, 2018.
    Google ScholarLocate open access versionFindings
  • Guoyin Li and Ting Kei Pong. Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Foundations of Computational Mathematics, 18(5):1199–1232, 2018.
    Google ScholarLocate open access versionFindings
  • Jiajin Li, Sen Huang, and Anthony Man-Cho So. A first-order algorithmic framework for Wasserstein distributionally robust logistic regression. In Advances in Neural Information Processing Systems, pages 3939–3949, 2019.
    Google ScholarLocate open access versionFindings
  • Xiao Li, Zhihui Zhu, Anthony Man-Cho So, and Jason D Lee. Incremental methods for weakly convex optimization. arXiv preprint arXiv:1907.11687, 2019.
    Findings
  • Meijiao Liu and Yong-Jin Liu. Fast algorithm for singly linearly constrained quadratic programs with box-like constraints. Computational Optimization and Applications, 66(2):309–326, 2017.
    Google ScholarLocate open access versionFindings
  • Fengqiao Luo and Sanjay Mehrotra. Decomposition algorithm for distributionally robust optimization using Wasserstein metric with an application to a class of regression models. European Journal of Operational Research, 278(1):20–35, 2019.
    Google ScholarLocate open access versionFindings
  • Peyman Mohajerin Esfahani and Daniel Kuhn. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, 171(1-2):115–166, 2018.
    Google ScholarLocate open access versionFindings
  • Angelia Nedicand Dimitri Bertsekas. Convergence rate of incremental subgradient algorithms. In Stochastic Optimization: Algorithms and Applications, pages 223–264.
    Google ScholarLocate open access versionFindings
  • Angelia Nedicand Dimitri P Bertsekas. Incremental subgradient methods for nondifferentiable optimization. SIAM Journal on Optimization, 12(1):109–138, 2001.
    Google ScholarLocate open access versionFindings
  • Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, and Peyman Mohajerin Esfahani. Regularization via mass transportation. Journal of Machine Learning Research, 20(103):1–68, 2019.
    Google ScholarLocate open access versionFindings
  • Manisha Singla, Debdas Ghosh, and KK Shukla. A survey of robust optimization based machine learning with special reference to support vector machines. International Journal of Machine Learning and Cybernetics, 11(7):1359–1385, 2020.
    Google ScholarLocate open access versionFindings
  • Aman Sinha, Hongseok Namkoong, and John Duchi. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Johan AK Suykens and Joos Vandewalle. Least squares support vector machine classifiers. Neural Processing Letters, 9(3):293–300, 1999.
    Google ScholarLocate open access versionFindings
  • Po-Wei Wang, Matt Wytock, and Zico Kolter. Epigraph projections for fast general convex programming. In International Conference on Machine Learning, pages 2868–2877, 2016.
    Google ScholarLocate open access versionFindings
  • Wolfram Wiesemann, Daniel Kuhn, and Melvyn Sim. Distributionally robust convex optimization. Operations Research, 62(6):1358–1376, 2014.
    Google ScholarLocate open access versionFindings
  • Zirui Zhou and Anthony Man-Cho So. A unified approach to error bounds for structured convex optimization problems. Mathematical Programming, 165(2):689–728, 2017. (1) If problem (1) satisfies the sharpness condition, then by choosing the geometrically diminishing step sizes αk
    Google ScholarLocate open access versionFindings
  • (2) If problem (1) satisfies the quadratic growth condition, then by choosing the polynomially decaying step sizes αk
    Google ScholarFindings
  • 0. Hence, we have B
    Google ScholarLocate open access versionFindings
  • 0. Based on (27), we have
    Google ScholarFindings
Author
Jiajin Li
Jiajin Li
Caihua Chen
Caihua Chen
Your rating :
0

 

Tags
Comments
小科