Width Of Minima Reached By Stochastic Gradient Descent Is Influenced By Learning Rate To Batch Size Ratio

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III(2018)

引用 28|浏览251
暂无评分
摘要
We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要