Width Of Minima Reached By Stochastic Gradient Descent Is Influenced By Learning Rate To Batch Size Ratio

Stanislaw Jastrzebski,Zachary Kenton,Devansh Arpit,Nicolas Ballas,Asja Fischer,Yoshua Bengio,Amos J. Storkey

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III（2018）

引用 28|浏览251

暂无评分

摘要

We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要