Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift
CVPR, Volume abs/1801.05134, 2019, Pages 2682-2690.
We investigate the “variance shift” phenomenon when Dropout layers are applied with Batch Normalization on modern convolutional networks
This paper first answers the question "why do the two most powerful techniques Dropout and Batch Normalization (BN) often lead to a worse performance when they are combined together?" in both theoretical and statistical aspects. Theoretically, we find that Dropout would shift the variance of a specific neural unit when we transfer the s...More
PPT (Upload PPT)