When Does Preconditioning Help or Hurt Generalization?
international conference on learning representations, 2020.
Characterized the population risk of preconditioned least squares regression in the overparameterized regime and determined the optimal preconditioner for generalization.
While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization remains controversial. For instance, it has been pointed out that gradient descent (GD), in contrast to second-order optimizers, converges to solutions with small Euclidean norm in many overparameterized model...More
PPT (Upload PPT)