An Alternative View: When Does SGD Escape Local Minima?
ICML, pp. 2698-2707, 2018.
Stochastic gradient descent (SGD) is widely used in machine learning. Although being commonly viewed as a fast but not accurate version of gradient descent (GD), it always finds better solutions than GD for modern neural networks. In order to understand this phenomenon, we take an alternative view that SGD is working on the convolved (thu...More
PPT (Upload PPT)