AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
The component function is a non-convex function which is less sensitive to the residual than the least square loss

Can Stochastic Zeroth-Order Frank-Wolfe Method Converge Faster for Non-Convex Problems?

ICML, pp.3377-3386, (2020)

Cited by: 2|Views65
EI
Full Text
Bibtex
Weibo

Abstract

Frank-Wolfe algorithm is an efficient method for optimizing non-convex constrained problems. However, most of existing methods focus on the first-order case. In real-world applications, the gradient is not always available. To address the problem of lacking gradient in many applications, we propose two new stochastic zerothorder Frank-Wol...More

Code:

Data:

0
Introduction
  • The authors consider the following constrained finitesum minimization problem: fi(x) , (1) x2⌦ n i=1

    where ⌦ ⇢ Rd denotes a closed convex feasible set, each component function fi is smooth and non-convex, and n represents the number of component functions.
  • A representative example is the robust low-rank matrix completion problem, which is defined as follows: X⇣.
  • Where O denotes the observed elements, is a hyperparameter, and kXk⇤ R stands for the low-rank constraint.
  • Compared with the non-constraint finite-sum minimization problem, optimizing Eq (1) has to deal with the constraint, which introduces new challenges.
  • A straightforward method to optimize the large-scale Eq (1) is the projected gradient descent method which first takes a step along the gradient direction and performs the projection to satisfy the constraint.
  • Frank-Wolfe method has been popularly used in optimizing Eq (1)
Highlights
  • In this paper, we consider the following constrained finitesum minimization problem: fi(x), (1) x2⌦ n i=1

    where ⌦ ⇢ Rd denotes a closed convex feasible set, each component function fi is smooth and non-convex, and n represents the number of component functions
  • The component function is a non-convex function which is less sensitive to the residual than the least square loss
  • A straightforward method to optimize the large-scale Eq (1) is the projected gradient descent method which first takes a step along the gradient direction and performs the projection to satisfy the constraint
  • Unlike the projected gradient descent method, Frank-Wolfe method (Frank & Wolfe, 1956) is more efficient when dealing with the constraint
  • We propose a new faster conditional gradient sliding (FCGS) method in Algorithm 4
  • We focus on the non-convex maximum correntropy criterion induced regression (MCCR) (Feng et al, 2015) model as follows: min
Methods
  • The authors focus on the non-convex maximum correntropy criterion induced regression (MCCR) (Feng et al, 2015) model as follows: ⇣ 21.
  • |xk1 s n i=1 n exp2 o⌘ (27).
  • Where and s are hyper-parameters.
  • As for the experiment for zeroth-order methods, the authors view the loss function as a black-box function, which means that only function value is available.
  • As for the experiment for first-order methods, both function value and gradient are available.
Results
  • Zeroth-Order Method The convergence result of the zeroth-order method is reported in Figure 1(a) and 1(b).
  • It can be found that the proposed methods outperform the baseline method significantly.
  • FZFW converges faster than ZSCG.
  • FZFW utilizes a variance reduced gradient estimator while ZSCG not.
  • The authors' proposed FZFW can converge faster than ZSCG.
  • The proposed FZCSG can outperform FZFW.
  • The reason is that FZCSG incorporates the acceleration technique
Conclusion
  • The authors improved the convergence rate of stochastic zeroth-order Frank-Wolfe method.
  • The authors proposed two algorithms for the zeroth-order Frank-Wolfe methods.
  • Both of them improve the function queries oracle significantly over existing methods.
  • The authors improved the accelerated stochastic zeroth-order FrankWolfe method to a better IFO.
  • Experimental results have confirmed the effectiveness of the proposed methods
Summary
  • Introduction:

    The authors consider the following constrained finitesum minimization problem: fi(x) , (1) x2⌦ n i=1

    where ⌦ ⇢ Rd denotes a closed convex feasible set, each component function fi is smooth and non-convex, and n represents the number of component functions.
  • A representative example is the robust low-rank matrix completion problem, which is defined as follows: X⇣.
  • Where O denotes the observed elements, is a hyperparameter, and kXk⇤ R stands for the low-rank constraint.
  • Compared with the non-constraint finite-sum minimization problem, optimizing Eq (1) has to deal with the constraint, which introduces new challenges.
  • A straightforward method to optimize the large-scale Eq (1) is the projected gradient descent method which first takes a step along the gradient direction and performs the projection to satisfy the constraint.
  • Frank-Wolfe method has been popularly used in optimizing Eq (1)
  • Methods:

    The authors focus on the non-convex maximum correntropy criterion induced regression (MCCR) (Feng et al, 2015) model as follows: ⇣ 21.
  • |xk1 s n i=1 n exp2 o⌘ (27).
  • Where and s are hyper-parameters.
  • As for the experiment for zeroth-order methods, the authors view the loss function as a black-box function, which means that only function value is available.
  • As for the experiment for first-order methods, both function value and gradient are available.
  • Results:

    Zeroth-Order Method The convergence result of the zeroth-order method is reported in Figure 1(a) and 1(b).
  • It can be found that the proposed methods outperform the baseline method significantly.
  • FZFW converges faster than ZSCG.
  • FZFW utilizes a variance reduced gradient estimator while ZSCG not.
  • The authors' proposed FZFW can converge faster than ZSCG.
  • The proposed FZCSG can outperform FZFW.
  • The reason is that FZCSG incorporates the acceleration technique
  • Conclusion:

    The authors improved the convergence rate of stochastic zeroth-order Frank-Wolfe method.
  • The authors proposed two algorithms for the zeroth-order Frank-Wolfe methods.
  • Both of them improve the function queries oracle significantly over existing methods.
  • The authors improved the accelerated stochastic zeroth-order FrankWolfe method to a better IFO.
  • Experimental results have confirmed the effectiveness of the proposed methods
Tables
  • Table1: Convergence rate of different zeroth-order algorithms
  • Table2: Convergence rate of different first-order conditional gradient sliding algorithms
Download tables as Excel
Funding
  • This work was partially supported by U.S NSF IIS 1836945, IIS 1836938, IIS 1845666, IIS 1852606, IIS 1838627, IIS 1837956
Reference
  • Balasubramanian, K. and Ghadimi, S. Zeroth-order (non)convex stochastic optimization via conditional gradient and gradient updates. In Advances in Neural Information Processing Systems, pp. 3455–3464, 2018.
    Google ScholarLocate open access versionFindings
  • Clarkson, K. L. Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM Transactions on Algorithms (TALG), 6(4):63, 2010.
    Google ScholarLocate open access versionFindings
  • Duchi, J. C., Jordan, M. I., Wainwright, M. J., and Wibisono, A. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788–2806, 2015.
    Google ScholarLocate open access versionFindings
  • Dvurechensky, P., Gasnikov, A., and Gorbunov, E. An accelerated method for derivative-free smooth stochastic convex optimization. arXiv preprint arXiv:1802.09022, 2018.
    Findings
  • Fang, C., Li, C. J., Lin, Z., and Zhang, T. Spider: Nearoptimal non-convex optimization via stochastic pathintegrated differential estimator. In Advances in Neural Information Processing Systems, pp. 689–699, 2018.
    Google ScholarLocate open access versionFindings
  • Feng, Y., Huang, X., Shi, L., Yang, Y., and Suykens, J. A. Learning with the maximum correntropy criterion induced losses for regression. Journal of Machine Learning Research, 16:993–1034, 2015.
    Google ScholarLocate open access versionFindings
  • Frank, M. and Wolfe, P. An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2): 95–110, 1956.
    Google ScholarLocate open access versionFindings
  • Gao, X., Jiang, B., and Zhang, S. On the informationadaptive variants of the admm: an iteration complexity perspective. Journal of Scientific Computing, 76(1):327– 363, 2018.
    Google ScholarLocate open access versionFindings
  • Ghadimi, S. and Lan, G. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
    Google ScholarLocate open access versionFindings
  • Hajinezhad, D., Hong, M., and Garcia, A. Zeroth order nonconvex multi-agent optimization over networks. arXiv preprint arXiv:1710.09997, 2017.
    Findings
  • Hassani, H., Karbasi, A., Mokhtari, A., and Shen, Z. Stochastic conditional gradient++. arXiv preprint arXiv:1902.06992, 2019.
    Findings
  • Hazan, E. and Luo, H. Variance-reduced and projection-free stochastic optimization. In International Conference on Machine Learning, pp. 1263–1271, 2016.
    Google ScholarLocate open access versionFindings
  • Ji, K., Wang, Z., Zhou, Y., and Liang, Y. Improved zerothorder variance reduced algorithms and analysis for nonconvex optimization. arXiv preprint arXiv:1910.12166, 2019.
    Findings
  • Lacoste-Julien, S. Convergence rate of frank-wolfe for non-convex objectives. arXiv preprint arXiv:1607.00345, 2016.
    Findings
  • Lacoste-Julien, S. and Jaggi, M. On the global linear convergence of frank-wolfe optimization variants. In Advances in Neural Information Processing Systems, pp. 496–504, 2015.
    Google ScholarLocate open access versionFindings
  • Lan, G. and Zhou, Y. Conditional gradient sliding for convex optimization. SIAM Journal on Optimization, 26(2):1379– 1409, 2016.
    Google ScholarLocate open access versionFindings
  • Lei, L., Ju, C., Chen, J., and Jordan, M. I. Non-convex finite-sum optimization via scsg methods. In Advances in Neural Information Processing Systems, pp. 2348–2358, 2017.
    Google ScholarLocate open access versionFindings
  • Lian, X., Zhang, H., Hsieh, C.-J., Huang, Y., and Liu, J. A comprehensive linear speedup analysis for asynchronous stochastic parallel optimization from zeroth-order to firstorder. In Advances in Neural Information Processing Systems, pp. 3054–3062, 2016.
    Google ScholarLocate open access versionFindings
  • Liu, S., Kailkhura, B., Chen, P.-Y., Ting, P., Chang, S., and Amini, L. Zeroth-order stochastic variance reduction for nonconvex optimization. In Advances in Neural Information Processing Systems, pp. 3727–3737, 2018.
    Google ScholarLocate open access versionFindings
  • Mokhtari, A., Hassani, H., and Karbasi, A. Stochastic conditional gradient methods: From convex minimization to submodular maximization. arXiv preprint arXiv:1804.09554, 2018.
    Findings
  • Nesterov, Y. and Spokoiny, V. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017.
    Google ScholarLocate open access versionFindings
  • Nguyen, L. M., Liu, J., Scheinberg, K., and Takac, M. Sarah: A novel method for machine learning problems using stochastic recursive gradient. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2613–2621. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Qu, C., Li, Y., and Xu, H. Non-convex conditional gradient sliding. arXiv preprint arXiv:1708.04783, 2017.
    Findings
  • Reddi, S. J., Sra, S., Poczos, B., and Smola, A. Stochastic frank-wolfe methods for nonconvex optimization. In 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • Sahu, A. K., Zaheer, M., and Kar, S. Towards gradient free and projection free stochastic optimization. arXiv preprint arXiv:1810.03233, 2018.
    Findings
  • Shamir, O. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. Journal of Machine Learning Research, 18(52):1–11, 2017.
    Google ScholarLocate open access versionFindings
  • Shen, Z., Fang, C., Zhao, P., Huang, J., and Qian, H. Complexities in projection-free stochastic non-convex minimization. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2868–2876, 2019.
    Google ScholarLocate open access versionFindings
  • Wang, Y., Du, S., Balakrishnan, S., and Singh, A. Stochastic zeroth-order optimization in high dimensions. arXiv preprint arXiv:1710.10551, 2017.
    Findings
  • Wang, Z., Ji, K., Zhou, Y., Liang, Y., and Tarokh, V. Spiderboost: A class of faster variance-reduced algorithms for nonconvex optimization. arXiv preprint arXiv:1810.10690, 2018.
    Findings
  • Yurtsever, A., Sra, S., and Cevher, V. Conditional gradient methods via stochastic path-integrated differential estimator. In Proceedings of the International Conference on Machine Learning-ICML 2019, number CONF, 2019.
    Google ScholarLocate open access versionFindings
  • Zhang, M., Shen, Z., Mokhtari, A., Hassani, H., and Karbasi, A. One sample stochastic frank-wolfe. arXiv preprint arXiv:1910.04322, 2019.
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科