Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks
conference on learning theory(2018)
摘要
We study the dynamics of gradient descent on objective functions of the form f(∏_i=1^k w_i) (with respect to scalar parameters w_1,…,w_k), which arise in the context of training depth-k linear neural networks. We prove that for standard random initializations, and under mild assumptions on f, the number of iterations required for convergence scales exponentially with the depth k. We also show empirically that this phenomenon can occur in higher dimensions, where each w_i is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where k is large.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络