Minimum norm solutions do not always generalize well for over-parameterized problems

arXiv: Machine Learning, Volume abs/1811.07055, 2018.

Cited by: 3|Bibtex|Views12|Links
EI

Abstract:

Stochastic gradient descent is the de facto algorithm for training deep neural networks (DNNs). Despite its popularity, it still requires fine hyper-parameter tuning in order to achieve its best performance. This has led to the development of adaptive methods, that claim automatic hyper-parameter tuning. Recently, researchers have studied...More

Code:

Data:

Your rating :
0

 

Tags
Comments