When Does Preconditioning Help or Hurt Generalization?
international conference on learning representations, 2020.
Weibo:
Abstract:
While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization remains controversial. For instance, it has been pointed out that gradient descent (GD), in contrast to second-order optimizers, converges to solutions with small Euclidean norm in many overparameterized model...More
Code:
Data:
Tags
Comments