Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), pp. 1141-1150, 2019.

Cited by: 21|Views127
EI

Abstract:

Deep neural networks have received dramatic success based on the optimization method of stochastic gradient descent (SGD). However, it is still not clear how to tune hyper-parameters, especially batch size and learning rate, to ensure good generalization. This paper reports both theoretical and empirical evidence of a training strategy th...More

Code:

Data:

Your rating :
0

 

Tags
Comments