Momentum Batch Normalization for Deep Learning with Small Batch Size

European Conference on Computer Vision(2020)

引用 37|浏览161
暂无评分
摘要
Normalization layers play an important role in deep network training. As one of the most popular normalization techniques, batch normalization (BN) has shown its effectiveness in accelerating the model training speed and improving model generalization capability. The success of BN has been explained from different views, such as reducing internal covariate shift, allowing the use of large learning rate, smoothing optimization landscape, etc. To make a deeper understanding of BN, in this work we prove that BN actually introduces a certain level of noise into the sample mean and variance during the training process, while the noise level depends only on the batch size. Such a noise generation mechanism of BN regularizes the training process, and we present an explicit regularizer formulation of BN. Since the regularization strength of BN is determined by the batch size, a small batch size may cause the under-fitting problem, resulting in a less effective model. To reduce the dependency of BN on batch size, we propose a momentum BN (MBN) scheme by averaging the mean and variance of current mini-batch with the historical means and variances. With a dynamic momentum parameter, we can automatically control the noise level in the training process. As a result, MBN works very well even when the batch size is very small (e.g., 2), which is hard to achieve by traditional BN.
更多
查看译文
关键词
small batch size,deep learning,momentum
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要