AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We develop an efficient procedure for projection onto the subproblem constraint set, thereby adapting projected first order methods to level-constrained proximal point for large-scale optimization and establish an Op1{εqpOp1{ε2qq complexity for deterministic optimization

A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization

NIPS 2020, (2020)

Cited by: 0|Views15
EI
Full Text
Bibtex
Weibo

Abstract

Nonconvex sparse models have received significant attention in high-dimensional machine learning. In this paper, we study a new model consisting of a general convex or nonconvex objectives and a variety of continuous nonconvex sparsity-inducing constraints. For this constrained model, we propose a novel proximal point algorithm that sol...More

Code:

Data:

0
Introduction
  • Recent years have witnessed a great deal of work on the sparse optimization arising from machine learning, statistics and signal processing.
  • Due to the discontinuity of 0 norm2, the above problem is intractable when there is no other assumptions.
  • To bypass this difficulty, a popular approach is to replace the 0-norm by the.
  • 1-norm, giving rise to an 1-constrained or 1-regularized problem.
  • A substantial amount of literature already exists for understanding the statistical properties of 1 models ([41, 32, 7, 39, 19]) as well as for the development efficient algorithms when such models are employed ([11, 1, 22, 34, 19])
Highlights
  • Recent years have witnessed a great deal of work on the sparse optimization arising from machine learning, statistics and signal processing
  • We present a novel proximal point algorithm (LCPP) for nonconvex optimization with a nonconvex sparsity-inducing constraint
  • We develop an efficient procedure for projection onto the subproblem constraint set, thereby adapting projected first order methods to level-constrained proximal point (LCPP) for large-scale optimization and establish an Op1{εqpOp1{ε2qq complexity for deterministic optimization
  • We perform numerical experiments to demonstrate the efficiency of our proposed algorithm for large scale sparse learning
  • This paper presents a new model for sparse optimization and performs an algorithmic study for the proposed model
  • Contributions made in this paper has the potential to inspire new research from statistical, algorithmic as well as experimental point of view in the wider sparse optimization area
Methods
  • The authors will consider the following learning problem: min x ψpxq 1 n řn i“1 Li pxq, s.t. gpxq ď η, where Lipxq denotes the loss function.
  • The authors find that spectral gradient outperforms the other methods in the logistic regression model and use it in LCPP for the remaining experiment for the sake of simplicity.
  • The rest of the section will compare the optimization efficiency of LCPP with the state-of-the-art nonlinear programming solver, and compare the proposed sparse constrained models solved by LCPP with standard convex and nonconvex sparse regularized models
Conclusion
  • The authors present a novel proximal point algorithm (LCPP) for nonconvex optimization with a nonconvex sparsity-inducing constraint.
  • This paper presents a new model for sparse optimization and performs an algorithmic study for the proposed model.
  • A rigorous statistical study of this model is still missing
  • The authors believe this was due to the tacit assumption that constrained optimization was more challenging compared to regularized optimization.
  • Contributions made in this paper has the potential to inspire new research from statistical, algorithmic as well as experimental point of view in the wider sparse optimization area
Tables
  • Table1: Iteration complexities of LCPP for problem (5) when the objective can be either convex or nonconvex, smooth or nonsmooth and deterministic or stochastic
  • Table2: Examples of constraint function gpxq “ λ x 1 ́ hpxq
  • Table3: Dataset description. R for regression and C for classification. mnist is formulated as a binary problem to classify digit 5 from the other digits. real-sim is randomly partitioned into 70% training data and 30% testing data
  • Table4: Classification error (%) of different methods for sparse logistic regression
Download tables as Excel
Related work
  • There is a growing interest in using convex majorization for solving nonconvex optimization with nonconvex function constraints. Typical frameworks include difference-of-convex (DC) programming ([30]), majorization-minimization ([28]) to name a few. Considering the substantial literature, we emphasize the most relevant work to our current paper. Scutari et al [26] proposed general approaches to majorize nonconvex constrained problems and include (5) as a special case. They require exact solutions of the subproblems and prove asymptotic convergence which is prohibitive for large-scale optimization. Shen et al [27] proposed a disciplined convexconcave programming (DCCP) framework for a class of DC programs in which (5) is a special case. Their work is empirical and does not provide specific convergence results.
Funding
  • Boob and Lan gratefully acknowledge the National Science Foundation (NSF) for its support through grant CCF 1909298
  • Deng acknowledges funding from National Natural Science Foundation of China (Grant 11831002)
Reference
  • A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
    Google ScholarLocate open access versionFindings
  • Dimitri P. Bertsekas. Nonlinear programming. Athena Scientific, 1999.
    Google ScholarLocate open access versionFindings
  • Dimitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. The annals of statistics, pages 813–852, 2016.
    Google ScholarLocate open access versionFindings
  • Thomas Blumensath and Mike E Davies. Iterative thresholding for sparse approximations. Journal of Fourier analysis and Applications, 14(5-6):629–654, 2008.
    Google ScholarLocate open access versionFindings
  • Digvijay Boob, Qi Deng, and Guanghui Lan. Stochastic first-order methods for convex and nonconvex functional constrained optimization. arXiv preprint arXiv:1908.02734, 2019.
    Findings
  • P.S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In Proceedings of International Conference on Machine Learning (ICML’98), pages 82–90. Morgan Kaufmann, 1998.
    Google ScholarLocate open access versionFindings
  • Emmanuel J Candès, Yaniv Plan, et al. Near-ideal model selection by 1 minimization. The Annals of Statistics, 37(5A):2145–2177, 2009.
    Google ScholarLocate open access versionFindings
  • Emmanuel J. Candes, Michael B. Wakin, and Stephen P. Boyd. Enhancing sparsity by reweighted l1 minimization. arXiv preprint arXiv:0711.1612, 2007.
    Findings
  • Steven Diamond and Stephen Boyd. Cvxpy: A python-embedded modeling language for convex optimization. The Journal of Machine Learning Research, 17(1):2909–2913, 2016.
    Google ScholarLocate open access versionFindings
  • John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto the 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272–279, 2008.
    Google ScholarLocate open access versionFindings
  • Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least angle regression. Annals of Statistics, 32(2):407–499, 2004.
    Google ScholarLocate open access versionFindings
  • Jianqing Fan and Runze Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456):1348–1360, 2001.
    Google ScholarLocate open access versionFindings
  • Simon Foucart. Hard thresholding pursuit: an algorithm for compressive sensing. SIAM Journal on Numerical Analysis, 49(6):2543–2563, 2011.
    Google ScholarLocate open access versionFindings
  • Wenjiang J. Fu. Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics, 7(3):397–416, 1998.
    Google ScholarLocate open access versionFindings
  • Saeed. Ghadimi and Guanghui. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: A generic algorithmic framework. SIAM Journal on Optimization, 22(4):1469–1492, 2012.
    Google ScholarLocate open access versionFindings
  • Pinghua Gong, Changshui Zhang, Zhaosong Lu, Jianhua Z. Huang, and Jieping Ye. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. International Conference on Machine Learning, 28(2):37–45, 2013.
    Google ScholarLocate open access versionFindings
  • Koulik Khamaru and Martin J. Wainwright. Convergence guarantees for a class of non-convex and non-smooth optimization problems. International Conference on Machine Learning, pages 2606–2615, 2018.
    Google ScholarLocate open access versionFindings
  • Yannis Kopsinis, Konstantinos Slavakis, and Sergios Theodoridis. Online sparse system identification and signal reconstruction using projections onto weighted 1 balls. IEEE Transactions on Signal Processing, 59(3):936–952, 2011.
    Google ScholarLocate open access versionFindings
  • A. Kyrillidis and V. Cevher. Combinatorial selection and least absolute shrinkage via the clash algorithm. In 2012 IEEE International Symposium on Information Theory Proceedings, pages 2216–2220, 2012.
    Google ScholarLocate open access versionFindings
  • Guanghui Lan, Zhize Li, and Yi Zhou. A unified variance-reduced accelerated gradient method for convex optimization. In NeurIPS 2019: Thirty-third Conference on Neural Information Processing Systems, pages 10462–10472, 2019.
    Google ScholarLocate open access versionFindings
  • Qihang Lin, Runchao Ma, and Yangyang Xu. Inexact proximal-point penalty methods for non-convex optimization with non-convex constraints. arXiv preprint arXiv:1908.11518, 2019.
    Findings
  • Yurii Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1):125–161, 2013.
    Google ScholarLocate open access versionFindings
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
    Google ScholarLocate open access versionFindings
  • B. D. Rao and K. Kreutz-Delgado. An affine scaling methodology for best basis selection. IEEE Transactions on Signal Processing, 47(1):187–200, January 1999.
    Google ScholarLocate open access versionFindings
  • H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, pages 111–135, 1971.
    Google ScholarLocate open access versionFindings
  • Gesualdo Scutari, Francisco Facchinei, Lorenzo Lampariello, Stefania Sardellitti, and Peiran Song. Parallel and distributed methods for constrained nonconvex optimization-part ii: Applications in communications and machine learning. IEEE Transactions on Signal Processing, 65(8):1945–1960, 2017.
    Google ScholarLocate open access versionFindings
  • Xinyue Shen, Steven Diamond, Yuantao Gu, and Stephen Boyd. Disciplined convex-concave programming. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 1009– 1014. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • Ying Sun, Prabhu Sing Babu, and Daniel P. Palomar. Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Transactions on Signal Processing, 65(3):794–816, 2017.
    Google ScholarLocate open access versionFindings
  • H.A. Le Thi, T. Pham Dinh, H.M. Le, and X.T. Vo. Dc approximation approaches for sparse optimization. European Journal of Operational Research, 244(1):26–46, 2015.
    Google ScholarLocate open access versionFindings
  • Hoai An Le Thi and Tao Pham Dinh. DC programming and DCA: thirty years of developments. Mathematical Programming, 169(1):5–68, 2018.
    Google ScholarLocate open access versionFindings
  • Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
    Google ScholarLocate open access versionFindings
  • Martin J Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (lasso). IEEE transactions on information theory, 55(5):2183–2202, 2009.
    Google ScholarLocate open access versionFindings
  • Jason Weston, André Elisseeff, Bernd Schölkopf, and Mike Tipping. Use of the zero-norm with linear models and kernel methods. The Journal of Machine Learning Research, 3:1439–1461, 2003.
    Google ScholarLocate open access versionFindings
  • Stephen J Wright, Robert D Nowak, and Mário AT Figueiredo. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7):2479–2493, 2009.
    Google ScholarLocate open access versionFindings
  • Lin Xiao and Tong Zhang. A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4):2057–2075, 2014.
    Google ScholarLocate open access versionFindings
  • Jun ya Gotoh, Akiko Takeda, and Katsuya Tono. DC formulations and algorithms for sparse optimization problems. Mathematical Programming, 169(1):141–176, 2018.
    Google ScholarLocate open access versionFindings
  • Xiao-Tong Yuan, Ping Li, and Tong Zhang. Gradient hard thresholding pursuit. The Journal of Machine Learning Research, 18(1):6027–6069, 2017.
    Google ScholarLocate open access versionFindings
  • Cun-Hui Zhang. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2):894–942, 2010.
    Google ScholarLocate open access versionFindings
  • Cun-Hui Zhang, Jian Huang, et al. The sparsity and bias of the lasso selection in highdimensional linear regression. The Annals of Statistics, 36(4):1567–1594, 2008.
    Google ScholarLocate open access versionFindings
  • Cun-Hui Zhang and Tong Zhang. A general theory of concave regularization for highdimensional sparse estimation problems. Statistical Science, 27(4):576–593, 2012.
    Google ScholarLocate open access versionFindings
  • Peng Zhao and Bin Yu. On model selection consistency of lasso. Journal of Machine learning research, 7(Nov):2541–2563, 2006.
    Google ScholarLocate open access versionFindings
  • Pan Zhou, Xiaotong Yuan, and Jiashi Feng. Efficient stochastic gradient hard thresholding. In Advances in Neural Information Processing Systems, pages 1984–1993, 2018.
    Google ScholarLocate open access versionFindings
  • 1. The sequence Erψpxkqs is bounded; 2. limkÑ8 ψpxkq exists a.s.; 3.
    Google ScholarFindings
  • 0. From Lipschitz continuity of l1 norm and ∇hpxq, we have limkÑ8 λ
    Google ScholarFindings
  • 2. They remain bounded below a constant. See Figure 1.
    Google ScholarFindings
  • 0. Then, for properly selected η0, we have that ηkgpxk1qřdi“1pλ
    Google ScholarFindings
  • 3. In fig (a), we see that for |x| ě 5, the MFCQ assumption is violated since only x-axis is feasible. Similar observation holds for y-axis as well. However, in fig(b) and fig(c) such claims are no longer valid.
    Google ScholarFindings
  • 0. Hence, setting u ě λθ maximizes the gpuq and minimizes zpαq “ zpgpuqq.
    Google ScholarFindings
  • 0. Then we show that z is a subadditive function. Using Jensen’s inequality, for all t P r0, 1s, we have zptxp1 ́ tqyq ě tzpxqp1 ́ tqzpyq.
    Google ScholarFindings
  • 2. Now taking expectation on both sides of (28) and using bound on ErAss in (30), we obtain
    Google ScholarFindings
Author
Digvijay Boob
Digvijay Boob
Qi Deng
Qi Deng
Yilin Wang
Yilin Wang
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科