A scaling calculus for the design and initialization of ReLU networks

user-5f8411ab4c775e9685ff56d3（2022）

引用 0|浏览44

暂无评分

摘要

We propose a system for calculating a “scaling constant” for layers and weights of neural networks. We relate this scaling constant to two important quantities that relate to the optimizability of neural networks, and argue that a network that is “preconditioned” via scaling, in the sense that all weights have the same scaling constant, will be easier to train. This scaling calculus results in a number of consequences, among them the fact that the geometric mean of the fan-in and fan-out, rather than the fan-in, fan-out, or arithmetic mean, should be used for the initialization of the variance of weights in a neural network. Our system allows for the off-line design & engineering of ReLU (Rectified Linear Unit) neural networks, potentially replacing blind experimentation. We verify the effectiveness of our approach on a set of benchmark problems.

查看译文

关键词

ReLU,Neural network initialization,Preconditioning,Artificial neural networks

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要