## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine

NIPS 2020, (2020)

EI

Keywords

Abstract

Wasserstein \textbf{D}istributionally \textbf{R}obust \textbf{O}ptimization (DRO) is concerned with finding decisions that perform well on data that are drawn from the worst-case probability distribution within a Wasserstein ball centered at a certain nominal distribution. In recent years, it has been shown that various DRO formulations...More

Code:

Data:

Introduction

- Wasserstein distance-based distributionally robust optimization (DRO) has recently received significant attention in the machine learning community.
- This can be attributed to its ability to improve generalization performance by robustifying the learning model against unseen data [13, 22].
- A standard approach to solving these reformulations is to use off-the-shelf solvers such as YALMIP or CPLEX
- These general-purpose solvers do not scale well with the problem size.

Highlights

- Wasserstein distance-based distributionally robust optimization (DRO) has recently received significant attention in the machine learning community
- We propose two new epigraphical projection-based incremental algorithms for solving the p-distributionally robust support vector machine (DRSVM) problem (1), which tackle the variables (w, λ) jointly
- We evaluate our two proposed incremental methods incremental projected subgradient descent (ISG) and incremental proximal point algorithm (IPPA) in different settings to corroborate our theoretical results in Section 4 and to better understand their empirical strengths and weaknesses
- We develop a hybrid algorithm that combines the advantages of both ISG and IPPA to further speed up the convergence in practice
- We only extend the first-order algorithmic framework to tackle the ∞-DRSVM problem
- We developed two new and highly efficient epigraphical projection-based incremental algorithms to solve the Wasserstein DRSVM problem with p norm-induced transport cost (p ∈ {1, 2, ∞}) and established their convergence rates

Results

- The authors present numerical results to demonstrate the efficiency of the proposed incremental methods.
- The authors evaluate the two proposed incremental methods ISG and IPPA in different settings to corroborate the theoretical results in Section 4 and to better understand their empirical strengths and weaknesses.
- The faster inner solver conjugate gradient with an active set method can only tackle the ∞ case in [15].
- The implementation details to reproduce all numerical results are given in the Appendix.
- The authors' code is available at https://github.com/gerrili1996/ Incremental_DRSVM

Conclusion

- The authors developed two new and highly efficient epigraphical projection-based incremental algorithms to solve the Wasserstein DRSVM problem with p norm-induced transport cost (p ∈ {1, 2, ∞}) and established their convergence rates.
- A natural future direction is to develop a minibatch version of IPPA and extend the algorithms to the asynchronous decentralized parallel setting.
- It would be interesting to develop some new incremental/stochastic algorithms to tackle more general Wasserstein DRO problems; see, e.g., problem (11) in [19]

Summary

## Introduction:

Wasserstein distance-based distributionally robust optimization (DRO) has recently received significant attention in the machine learning community.- This can be attributed to its ability to improve generalization performance by robustifying the learning model against unseen data [13, 22].
- A standard approach to solving these reformulations is to use off-the-shelf solvers such as YALMIP or CPLEX
- These general-purpose solvers do not scale well with the problem size.
## Results:

The authors present numerical results to demonstrate the efficiency of the proposed incremental methods.- The authors evaluate the two proposed incremental methods ISG and IPPA in different settings to corroborate the theoretical results in Section 4 and to better understand their empirical strengths and weaknesses.
- The faster inner solver conjugate gradient with an active set method can only tackle the ∞ case in [15].
- The implementation details to reproduce all numerical results are given in the Appendix.
- The authors' code is available at https://github.com/gerrili1996/ Incremental_DRSVM
## Conclusion:

The authors developed two new and highly efficient epigraphical projection-based incremental algorithms to solve the Wasserstein DRSVM problem with p norm-induced transport cost (p ∈ {1, 2, ∞}) and established their convergence rates.- A natural future direction is to develop a minibatch version of IPPA and extend the algorithms to the asynchronous decentralized parallel setting.
- It would be interesting to develop some new incremental/stochastic algorithms to tackle more general Wasserstein DRO problems; see, e.g., problem (11) in [19]

- Table1: Convergence rates of incremental algorithms for p-DRSVM
- Table2: Wall-clock Time Comparison on UCI Real Dataset: 1-DRSVM, c = 0, κ = 1, = 0.1
- Table3: Wall-clock Time Comparison on UCI Real Dataset: 2-DRSVM, c = 0, κ = 1, = 0.1
- Table4: Wall-clock Time Comparison on UCI Real Dataset: ∞-DRSVM, c = 0, κ = 1, = 0.1
- Table5: Summary of all ingredients in ISG and IPPA
- Table6: Summary of all sub-cases for 2 proximal point update (7)
- Table7: Wall-clock Time Comparison on UCI Real Dataset: 1-DRSVM, c = 1, κ = 1, = 0.1
- Table8: Wall-clock Time Comparison on UCI Real Dataset: ∞-DRSVM, c = 1, κ = 1, = 0.1

Funding

- Caihua Chen is supported in part by the National Natural Science Foundation of China (NSFC) projects 71732003, 11871269 and in part by the Natural Science Foundation of Jiangsu Province project BK20181259
- Anthony Man-Cho So is supported in part by the CUHK Research Sustainability of Major RGC Funding Schemes project 3133236. Broader Impact This work does not present any foreseeable societal consequence

Reference

- Heinz H. Bauschke. Projection Algorithms and Monotone Operators. PhD thesis, Simon Fraser University, 1996.
- Heinz H Bauschke and Jonathan M Borwein. On projection algorithms for solving convex feasibility problems. SIAM Review, 38(3):367–426, 1996.
- Dimitri P Bertsekas. Incremental proximal methods for large scale convex optimization. Mathematical Programming, 129(2):163–195, 2011.
- Jose Blanchet, Yang Kang, and Karthyek Murthy. Robust Wasserstein profile inference and applications to machine learning. Journal of Applied Probability, 56(3):830–857, 2019.
- Jose Blanchet, Karthyek Murthy, and Fan Zhang. Optimal transport based distributionally robust optimization: Structural properties and iterative schemes. arXiv preprint arXiv:1810.02403, 2018.
- Jérôme Bolte, Trong Phong Nguyen, Juan Peypouquet, and Bruce W Suter. From error bounds to the complexity of first-order descent methods for convex functions. Mathematical Programming, 165(2):471–507, 2017.
- J. Frédéric Bonnans and Alexander Shapiro. Perturbation Analysis of Optimization Problems. Springer Series in Operations Research. Springer–Verlag, New York, 2000.
- James V Burke and Michael C Ferris. Weak sharp minima in mathematical programming. SIAM Journal on Control and Optimization, 31(5):1340–1359, 1993.
- Yu-Hong Dai and Roger Fletcher. New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds. Mathematical Programming, 106(3):403–421, 2006.
- Rui Gao, Xi Chen, and Anton J Kleywegt. Wasserstein distributional robustness and regularization in statistical learning. arXiv preprint arXiv:1712.06050, 2017.
- Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh ShafieezadehAbadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations Research & Management Science in the Age of Analytics, pages 130–166. INFORMS, 2019.
- Changhyeok Lee and Sanjay Mehrotra. A distributionally-robust approach for finding support vector machines. Optimization Online, 2015.
- Jaeho Lee and Maxim Raginsky. Minimax statistical learning with Wasserstein distances. In Advances in Neural Information Processing Systems, pages 2687–2696, 2018.
- Guoyin Li and Ting Kei Pong. Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Foundations of Computational Mathematics, 18(5):1199–1232, 2018.
- Jiajin Li, Sen Huang, and Anthony Man-Cho So. A first-order algorithmic framework for Wasserstein distributionally robust logistic regression. In Advances in Neural Information Processing Systems, pages 3939–3949, 2019.
- Xiao Li, Zhihui Zhu, Anthony Man-Cho So, and Jason D Lee. Incremental methods for weakly convex optimization. arXiv preprint arXiv:1907.11687, 2019.
- Meijiao Liu and Yong-Jin Liu. Fast algorithm for singly linearly constrained quadratic programs with box-like constraints. Computational Optimization and Applications, 66(2):309–326, 2017.
- Fengqiao Luo and Sanjay Mehrotra. Decomposition algorithm for distributionally robust optimization using Wasserstein metric with an application to a class of regression models. European Journal of Operational Research, 278(1):20–35, 2019.
- Peyman Mohajerin Esfahani and Daniel Kuhn. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, 171(1-2):115–166, 2018.
- Angelia Nedicand Dimitri Bertsekas. Convergence rate of incremental subgradient algorithms. In Stochastic Optimization: Algorithms and Applications, pages 223–264.
- Angelia Nedicand Dimitri P Bertsekas. Incremental subgradient methods for nondifferentiable optimization. SIAM Journal on Optimization, 12(1):109–138, 2001.
- Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, and Peyman Mohajerin Esfahani. Regularization via mass transportation. Journal of Machine Learning Research, 20(103):1–68, 2019.
- Manisha Singla, Debdas Ghosh, and KK Shukla. A survey of robust optimization based machine learning with special reference to support vector machines. International Journal of Machine Learning and Cybernetics, 11(7):1359–1385, 2020.
- Aman Sinha, Hongseok Namkoong, and John Duchi. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
- Johan AK Suykens and Joos Vandewalle. Least squares support vector machine classifiers. Neural Processing Letters, 9(3):293–300, 1999.
- Po-Wei Wang, Matt Wytock, and Zico Kolter. Epigraph projections for fast general convex programming. In International Conference on Machine Learning, pages 2868–2877, 2016.
- Wolfram Wiesemann, Daniel Kuhn, and Melvyn Sim. Distributionally robust convex optimization. Operations Research, 62(6):1358–1376, 2014.
- Zirui Zhou and Anthony Man-Cho So. A unified approach to error bounds for structured convex optimization problems. Mathematical Programming, 165(2):689–728, 2017. (1) If problem (1) satisfies the sharpness condition, then by choosing the geometrically diminishing step sizes αk
- (2) If problem (1) satisfies the quadratic growth condition, then by choosing the polynomially decaying step sizes αk
- 0. Hence, we have B
- 0. Based on (27), we have

Tags

Comments