## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization

NIPS 2020, (2020)

EI

Keywords

Abstract

Nonconvex sparse models have received significant attention in high-dimensional machine learning. In this paper, we study a new model consisting of a general convex or nonconvex objectives and a variety of continuous nonconvex sparsity-inducing constraints. For this constrained model, we propose a novel proximal point algorithm that sol...More

Code:

Data:

Introduction

- Recent years have witnessed a great deal of work on the sparse optimization arising from machine learning, statistics and signal processing.
- Due to the discontinuity of 0 norm2, the above problem is intractable when there is no other assumptions.
- To bypass this difficulty, a popular approach is to replace the 0-norm by the.
- 1-norm, giving rise to an 1-constrained or 1-regularized problem.
- A substantial amount of literature already exists for understanding the statistical properties of 1 models ([41, 32, 7, 39, 19]) as well as for the development efficient algorithms when such models are employed ([11, 1, 22, 34, 19])

Highlights

- Recent years have witnessed a great deal of work on the sparse optimization arising from machine learning, statistics and signal processing
- We present a novel proximal point algorithm (LCPP) for nonconvex optimization with a nonconvex sparsity-inducing constraint
- We develop an efficient procedure for projection onto the subproblem constraint set, thereby adapting projected first order methods to level-constrained proximal point (LCPP) for large-scale optimization and establish an Op1{εqpOp1{ε2qq complexity for deterministic optimization
- We perform numerical experiments to demonstrate the efficiency of our proposed algorithm for large scale sparse learning
- This paper presents a new model for sparse optimization and performs an algorithmic study for the proposed model
- Contributions made in this paper has the potential to inspire new research from statistical, algorithmic as well as experimental point of view in the wider sparse optimization area

Methods

- The authors will consider the following learning problem: min x ψpxq 1 n řn i“1 Li pxq, s.t. gpxq ď η, where Lipxq denotes the loss function.
- The authors find that spectral gradient outperforms the other methods in the logistic regression model and use it in LCPP for the remaining experiment for the sake of simplicity.
- The rest of the section will compare the optimization efficiency of LCPP with the state-of-the-art nonlinear programming solver, and compare the proposed sparse constrained models solved by LCPP with standard convex and nonconvex sparse regularized models

Conclusion

- The authors present a novel proximal point algorithm (LCPP) for nonconvex optimization with a nonconvex sparsity-inducing constraint.
- This paper presents a new model for sparse optimization and performs an algorithmic study for the proposed model.
- A rigorous statistical study of this model is still missing
- The authors believe this was due to the tacit assumption that constrained optimization was more challenging compared to regularized optimization.
- Contributions made in this paper has the potential to inspire new research from statistical, algorithmic as well as experimental point of view in the wider sparse optimization area

- Table1: Iteration complexities of LCPP for problem (5) when the objective can be either convex or nonconvex, smooth or nonsmooth and deterministic or stochastic
- Table2: Examples of constraint function gpxq “ λ x 1 ́ hpxq
- Table3: Dataset description. R for regression and C for classification. mnist is formulated as a binary problem to classify digit 5 from the other digits. real-sim is randomly partitioned into 70% training data and 30% testing data
- Table4: Classification error (%) of different methods for sparse logistic regression

Related work

- There is a growing interest in using convex majorization for solving nonconvex optimization with nonconvex function constraints. Typical frameworks include difference-of-convex (DC) programming ([30]), majorization-minimization ([28]) to name a few. Considering the substantial literature, we emphasize the most relevant work to our current paper. Scutari et al [26] proposed general approaches to majorize nonconvex constrained problems and include (5) as a special case. They require exact solutions of the subproblems and prove asymptotic convergence which is prohibitive for large-scale optimization. Shen et al [27] proposed a disciplined convexconcave programming (DCCP) framework for a class of DC programs in which (5) is a special case. Their work is empirical and does not provide specific convergence results.

Funding

- Boob and Lan gratefully acknowledge the National Science Foundation (NSF) for its support through grant CCF 1909298
- Deng acknowledges funding from National Natural Science Foundation of China (Grant 11831002)

Reference

- A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
- Dimitri P. Bertsekas. Nonlinear programming. Athena Scientific, 1999.
- Dimitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. The annals of statistics, pages 813–852, 2016.
- Thomas Blumensath and Mike E Davies. Iterative thresholding for sparse approximations. Journal of Fourier analysis and Applications, 14(5-6):629–654, 2008.
- Digvijay Boob, Qi Deng, and Guanghui Lan. Stochastic first-order methods for convex and nonconvex functional constrained optimization. arXiv preprint arXiv:1908.02734, 2019.
- P.S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In Proceedings of International Conference on Machine Learning (ICML’98), pages 82–90. Morgan Kaufmann, 1998.
- Emmanuel J Candès, Yaniv Plan, et al. Near-ideal model selection by 1 minimization. The Annals of Statistics, 37(5A):2145–2177, 2009.
- Emmanuel J. Candes, Michael B. Wakin, and Stephen P. Boyd. Enhancing sparsity by reweighted l1 minimization. arXiv preprint arXiv:0711.1612, 2007.
- Steven Diamond and Stephen Boyd. Cvxpy: A python-embedded modeling language for convex optimization. The Journal of Machine Learning Research, 17(1):2909–2913, 2016.
- John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto the 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272–279, 2008.
- Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least angle regression. Annals of Statistics, 32(2):407–499, 2004.
- Jianqing Fan and Runze Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456):1348–1360, 2001.
- Simon Foucart. Hard thresholding pursuit: an algorithm for compressive sensing. SIAM Journal on Numerical Analysis, 49(6):2543–2563, 2011.
- Wenjiang J. Fu. Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics, 7(3):397–416, 1998.
- Saeed. Ghadimi and Guanghui. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: A generic algorithmic framework. SIAM Journal on Optimization, 22(4):1469–1492, 2012.
- Pinghua Gong, Changshui Zhang, Zhaosong Lu, Jianhua Z. Huang, and Jieping Ye. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. International Conference on Machine Learning, 28(2):37–45, 2013.
- Koulik Khamaru and Martin J. Wainwright. Convergence guarantees for a class of non-convex and non-smooth optimization problems. International Conference on Machine Learning, pages 2606–2615, 2018.
- Yannis Kopsinis, Konstantinos Slavakis, and Sergios Theodoridis. Online sparse system identification and signal reconstruction using projections onto weighted 1 balls. IEEE Transactions on Signal Processing, 59(3):936–952, 2011.
- A. Kyrillidis and V. Cevher. Combinatorial selection and least absolute shrinkage via the clash algorithm. In 2012 IEEE International Symposium on Information Theory Proceedings, pages 2216–2220, 2012.
- Guanghui Lan, Zhize Li, and Yi Zhou. A unified variance-reduced accelerated gradient method for convex optimization. In NeurIPS 2019: Thirty-third Conference on Neural Information Processing Systems, pages 10462–10472, 2019.
- Qihang Lin, Runchao Ma, and Yangyang Xu. Inexact proximal-point penalty methods for non-convex optimization with non-convex constraints. arXiv preprint arXiv:1908.11518, 2019.
- Yurii Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1):125–161, 2013.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- B. D. Rao and K. Kreutz-Delgado. An affine scaling methodology for best basis selection. IEEE Transactions on Signal Processing, 47(1):187–200, January 1999.
- H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, pages 111–135, 1971.
- Gesualdo Scutari, Francisco Facchinei, Lorenzo Lampariello, Stefania Sardellitti, and Peiran Song. Parallel and distributed methods for constrained nonconvex optimization-part ii: Applications in communications and machine learning. IEEE Transactions on Signal Processing, 65(8):1945–1960, 2017.
- Xinyue Shen, Steven Diamond, Yuantao Gu, and Stephen Boyd. Disciplined convex-concave programming. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 1009– 1014. IEEE, 2016.
- Ying Sun, Prabhu Sing Babu, and Daniel P. Palomar. Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Transactions on Signal Processing, 65(3):794–816, 2017.
- H.A. Le Thi, T. Pham Dinh, H.M. Le, and X.T. Vo. Dc approximation approaches for sparse optimization. European Journal of Operational Research, 244(1):26–46, 2015.
- Hoai An Le Thi and Tao Pham Dinh. DC programming and DCA: thirty years of developments. Mathematical Programming, 169(1):5–68, 2018.
- Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
- Martin J Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (lasso). IEEE transactions on information theory, 55(5):2183–2202, 2009.
- Jason Weston, André Elisseeff, Bernd Schölkopf, and Mike Tipping. Use of the zero-norm with linear models and kernel methods. The Journal of Machine Learning Research, 3:1439–1461, 2003.
- Stephen J Wright, Robert D Nowak, and Mário AT Figueiredo. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7):2479–2493, 2009.
- Lin Xiao and Tong Zhang. A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4):2057–2075, 2014.
- Jun ya Gotoh, Akiko Takeda, and Katsuya Tono. DC formulations and algorithms for sparse optimization problems. Mathematical Programming, 169(1):141–176, 2018.
- Xiao-Tong Yuan, Ping Li, and Tong Zhang. Gradient hard thresholding pursuit. The Journal of Machine Learning Research, 18(1):6027–6069, 2017.
- Cun-Hui Zhang. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2):894–942, 2010.
- Cun-Hui Zhang, Jian Huang, et al. The sparsity and bias of the lasso selection in highdimensional linear regression. The Annals of Statistics, 36(4):1567–1594, 2008.
- Cun-Hui Zhang and Tong Zhang. A general theory of concave regularization for highdimensional sparse estimation problems. Statistical Science, 27(4):576–593, 2012.
- Peng Zhao and Bin Yu. On model selection consistency of lasso. Journal of Machine learning research, 7(Nov):2541–2563, 2006.
- Pan Zhou, Xiaotong Yuan, and Jiashi Feng. Efficient stochastic gradient hard thresholding. In Advances in Neural Information Processing Systems, pages 1984–1993, 2018.
- 1. The sequence Erψpxkqs is bounded; 2. limkÑ8 ψpxkq exists a.s.; 3.
- 0. From Lipschitz continuity of l1 norm and ∇hpxq, we have limkÑ8 λ
- 2. They remain bounded below a constant. See Figure 1.
- 0. Then, for properly selected η0, we have that ηkgpxk1qřdi“1pλ
- 3. In fig (a), we see that for |x| ě 5, the MFCQ assumption is violated since only x-axis is feasible. Similar observation holds for y-axis as well. However, in fig(b) and fig(c) such claims are no longer valid.
- 0. Hence, setting u ě λθ maximizes the gpuq and minimizes zpαq “ zpgpuqq.
- 0. Then we show that z is a subadditive function. Using Jensen’s inequality, for all t P r0, 1s, we have zptxp1 ́ tqyq ě tzpxqp1 ́ tqzpyq.
- 2. Now taking expectation on both sides of (28) and using bound on ErAss in (30), we obtain

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn