Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
CoRR(2023)
摘要
This study investigates how weight decay affects the update behavior of
individual neurons in deep neural networks through a combination of applied
analysis and experimentation. Weight decay can cause the expected magnitude and
angular updates of a neuron's weight vector to converge to a steady state we
call rotational equilibrium. These states can be highly homogeneous,
effectively balancing the average rotation – a proxy for the effective
learning rate – across different layers and neurons. Our work analyzes these
dynamics across optimizers like Adam, Lion, and SGD with momentum, offering a
new simple perspective on training that elucidates the efficacy of widely used
but poorly understood methods in deep learning. We demonstrate how balanced
rotation plays a key role in the effectiveness of normalization like Weight
Standardization, as well as that of AdamW over Adam with L2-regularization.
Finally, we show that explicitly controlling the rotation provides the benefits
of weight decay while substantially reducing the need for learning rate warmup.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要