Batch normalization provably avoids ranks collapse for randomly initialised deep networks

NIPS 2020, 2020.

Cited by: 0|Views50
EI
Weibo:
Increased depth typically leads to a drastic slow down of learning with gradient-based methods, which is commonly attributed to unstable gradient norms in deep networks

Abstract:

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products...More

Code:

Data:

0
Your rating :
0

 

Tags
Comments