A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

Cited by: 0|Views11

Abstract:

It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a $d$-dimensional Hadamard matrix and the target is one of the features, i.e. very sparse. We ess...More

Code:

Data:

Full Text
Bibtex