Learning Over-parametrized Two-layer ReLU Neural Networks beyond NTK
conference on learning theory(2020)
摘要
We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input $x\\in\\mathbb{R}^d$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f^{\\star}(x) = a^{\\top}|W^{\\star}x|$, where $a\\in\\mathbb{R}^d$ is a nonnegative vector and $W^{\\star} \\in\\mathbb{R}^{d\\times d}$ is an orthonormal matrix. We show that an \\emph{over-parameterized} two-layer neural network with ReLU activation, trained by gradient descent from \\emph{random initialization}, can provably learn the ground truth network with population loss at most $o(1/d)$ in polynomial time with polynomial samples. On the other hand, we prove that any kernel method, including Neural Tangent Kernel, with a polynomial number of samples in $d$, has population loss at least $\\Omega(1 / d)$.
更多查看译文
关键词
neural networks,learning,over-parametrized,two-layer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络