Minimum norm solutions do not always generalize well for over-parameterized problems
arXiv: Machine Learning, Volume abs/1811.07055, 2018.
Stochastic gradient descent is the de facto algorithm for training deep neural networks (DNNs). Despite its popularity, it still requires fine hyper-parameter tuning in order to achieve its best performance. This has led to the development of adaptive methods, that claim automatic hyper-parameter tuning. Recently, researchers have studied...More
Full Text (Upload PDF)
PPT (Upload PPT)