An Alternative View: When Does SGD Escape Local Minima?

ICML, pp. 2698-2707, 2018.

Cited by: 88|Bibtex|Views59
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org

Abstract:

Stochastic gradient descent (SGD) is widely used in machine learning. Although being commonly viewed as a fast but not accurate version of gradient descent (GD), it always finds better solutions than GD for modern neural networks. In order to understand this phenomenon, we take an alternative view that SGD is working on the convolved (thu...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments