## AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically Go Generating

## AI Traceability

AI parses the academic lineage of this thesis Generate MRT

## AI Insight

AI extracts a summary of this paper

Weibo:
The component function is a non-convex function which is less sensitive to the residual than the least square loss

# Can Stochastic Zeroth-Order Frank-Wolfe Method Converge Faster for Non-Convex Problems?

ICML, pp.3377-3386, (2020)

Cited by: 2|Views65
EI
Full Text
Bibtex
Weibo

Abstract

Frank-Wolfe algorithm is an efficient method for optimizing non-convex constrained problems. However, most of existing methods focus on the first-order case. In real-world applications, the gradient is not always available. To address the problem of lacking gradient in many applications, we propose two new stochastic zerothorder Frank-Wol...More

Code:

Data:

0
Introduction
• The authors consider the following constrained finitesum minimization problem: fi(x) , (1) x2⌦ n i=1

where ⌦ ⇢ Rd denotes a closed convex feasible set, each component function fi is smooth and non-convex, and n represents the number of component functions.
• A representative example is the robust low-rank matrix completion problem, which is defined as follows: X⇣.
• Where O denotes the observed elements, is a hyperparameter, and kXk⇤ R stands for the low-rank constraint.
• Compared with the non-constraint finite-sum minimization problem, optimizing Eq (1) has to deal with the constraint, which introduces new challenges.
• A straightforward method to optimize the large-scale Eq (1) is the projected gradient descent method which first takes a step along the gradient direction and performs the projection to satisfy the constraint.
• Frank-Wolfe method has been popularly used in optimizing Eq (1)
Highlights
• In this paper, we consider the following constrained finitesum minimization problem: fi(x), (1) x2⌦ n i=1

where ⌦ ⇢ Rd denotes a closed convex feasible set, each component function fi is smooth and non-convex, and n represents the number of component functions
• The component function is a non-convex function which is less sensitive to the residual than the least square loss
• A straightforward method to optimize the large-scale Eq (1) is the projected gradient descent method which first takes a step along the gradient direction and performs the projection to satisfy the constraint
• Unlike the projected gradient descent method, Frank-Wolfe method (Frank & Wolfe, 1956) is more efficient when dealing with the constraint
• We propose a new faster conditional gradient sliding (FCGS) method in Algorithm 4
• We focus on the non-convex maximum correntropy criterion induced regression (MCCR) (Feng et al, 2015) model as follows: min
Methods
• The authors focus on the non-convex maximum correntropy criterion induced regression (MCCR) (Feng et al, 2015) model as follows: ⇣ 21.
• |xk1 s n i=1 n exp2 o⌘ (27).
• Where and s are hyper-parameters.
• As for the experiment for zeroth-order methods, the authors view the loss function as a black-box function, which means that only function value is available.
• As for the experiment for first-order methods, both function value and gradient are available.
Results
• Zeroth-Order Method The convergence result of the zeroth-order method is reported in Figure 1(a) and 1(b).
• It can be found that the proposed methods outperform the baseline method significantly.
• FZFW converges faster than ZSCG.
• FZFW utilizes a variance reduced gradient estimator while ZSCG not.
• The authors' proposed FZFW can converge faster than ZSCG.
• The proposed FZCSG can outperform FZFW.
• The reason is that FZCSG incorporates the acceleration technique
Conclusion
• The authors improved the convergence rate of stochastic zeroth-order Frank-Wolfe method.
• The authors proposed two algorithms for the zeroth-order Frank-Wolfe methods.
• Both of them improve the function queries oracle significantly over existing methods.
• The authors improved the accelerated stochastic zeroth-order FrankWolfe method to a better IFO.
• Experimental results have confirmed the effectiveness of the proposed methods
Summary
• ## Introduction:

The authors consider the following constrained finitesum minimization problem: fi(x) , (1) x2⌦ n i=1

where ⌦ ⇢ Rd denotes a closed convex feasible set, each component function fi is smooth and non-convex, and n represents the number of component functions.
• A representative example is the robust low-rank matrix completion problem, which is defined as follows: X⇣.
• Where O denotes the observed elements, is a hyperparameter, and kXk⇤ R stands for the low-rank constraint.
• Compared with the non-constraint finite-sum minimization problem, optimizing Eq (1) has to deal with the constraint, which introduces new challenges.
• A straightforward method to optimize the large-scale Eq (1) is the projected gradient descent method which first takes a step along the gradient direction and performs the projection to satisfy the constraint.
• Frank-Wolfe method has been popularly used in optimizing Eq (1)
• ## Methods:

The authors focus on the non-convex maximum correntropy criterion induced regression (MCCR) (Feng et al, 2015) model as follows: ⇣ 21.
• |xk1 s n i=1 n exp2 o⌘ (27).
• Where and s are hyper-parameters.
• As for the experiment for zeroth-order methods, the authors view the loss function as a black-box function, which means that only function value is available.
• As for the experiment for first-order methods, both function value and gradient are available.
• ## Results:

Zeroth-Order Method The convergence result of the zeroth-order method is reported in Figure 1(a) and 1(b).
• It can be found that the proposed methods outperform the baseline method significantly.
• FZFW converges faster than ZSCG.
• FZFW utilizes a variance reduced gradient estimator while ZSCG not.
• The authors' proposed FZFW can converge faster than ZSCG.
• The proposed FZCSG can outperform FZFW.
• The reason is that FZCSG incorporates the acceleration technique
• ## Conclusion:

The authors improved the convergence rate of stochastic zeroth-order Frank-Wolfe method.
• The authors proposed two algorithms for the zeroth-order Frank-Wolfe methods.
• Both of them improve the function queries oracle significantly over existing methods.
• The authors improved the accelerated stochastic zeroth-order FrankWolfe method to a better IFO.
• Experimental results have confirmed the effectiveness of the proposed methods
Tables
• Table1: Convergence rate of different zeroth-order algorithms
• Table2: Convergence rate of different first-order conditional gradient sliding algorithms
Funding
• This work was partially supported by U.S NSF IIS 1836945, IIS 1836938, IIS 1845666, IIS 1852606, IIS 1838627, IIS 1837956
Reference
Author   