Minimum weight norm models do not always generalize well for over-parameterized problems

Shah Vatsal
Shah Vatsal
Kyrillidis Anastasios
Kyrillidis Anastasios
Cited by: 0|Bibtex|Views6|Links

Abstract:

Stochastic gradient descent is the de facto algorithm for training deep neural networks (DNNs). Despite its popularity, it still requires fine tuning in order to achieve its best performance. This has led to the development of adaptive methods, that claim automatic hyper-parameter optimization. Recently, researchers have studied both ...More

Code:

Data:

Your rating :
0

 

Tags
Comments