Width Of Minima Reached By Stochastic Gradient Descent Is Influenced By Learning Rate To Batch Size Ratio
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III(2018)
摘要
We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要