# When Does Preconditioning Help or Hurt Generalization?

Shun-ichi Amari
Xuechen Li
Atsushi Nitanda
Denny Wu
Ji Xu

international conference on learning representations, 2020.

Characterized the population risk of preconditioned least squares regression in the overparameterized regime and determined the optimal preconditioner for generalization.

Abstract:

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization remains controversial. For instance, it has been pointed out that gradient descent (GD), in contrast to second-order optimizers, converges to solutions with small Euclidean norm in many overparameterized model...More

