AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute

On the expressive power of deep neural networks.

international conference on machine learning, (2017)

Cited: 561|Views224
EI
Full Text
Bibtex
Weibo

Abstract

We study the expressive power of deep neural networks before and aftertraining. Considering neural nets after random initialization, we show thatthree natural measures of expressivity all display an exponential dependenceon the depth of the network. We prove, theoretically and experimentally,that all of these measures are in fact related ...More

Code:

Data:

0
Introduction
  • Deep neural networks have proved astoundingly effective at a wide range of empirical tasks, from image classification (Krizhevsky et al, 2012) to playing Go (Silver et al, 2016), and even modeling human learning (Piech et al, 2015).

    Despite these successes, understanding of how and why neural network architectures achieve their empirical successes is still lacking.
  • Trajectory length serves as a unifying notion in the measures of expressivity, and it leads to insights into the behavior of trained networks.
  • The authors find that the exponential growth in trajectory length as a function of depth implies that small adjustments in parameters lower in the network induce larger changes than comparable adjustments higher in the network.
Highlights
  • Deep neural networks have proved astoundingly effective at a wide range of empirical tasks, from image classification (Krizhevsky et al, 2012) to playing Go (Silver et al, 2016), and even modeling human learning (Piech et al, 2015).

    Despite these successes, understanding of how and why neural network architectures achieve their empirical successes is still lacking
  • Our contributions: Measures of Expressivity and their Applications In this paper, we address this set of challenges by defining and analyzing an interrelated set of measures of expressivity for neural networks; our framework applies to a wide range of standard architectures, independent of specific weight choices
  • We explore the effects of regularization methods on trajectory length as the network trains and propose a less computationally intensive method of regularization, trajectory regularization, that offers the same performance as batch normalization
  • (2) Exponential trajectories: We find an exponential depth dependence displayed by these measures, through a unifying analysis in which we study how the network transforms its input by measuring trajectory length
  • Characterizing the expressiveness of neural networks, and understanding how expressiveness varies with parameters of the architecture, has been a challenging problem due to the difficulty in identifying meaningful notions of expressivity and in linking their analysis to implications for these networks in practice
  • In this paper we have presented an
Results
  • Returning to Montufar et al, they provide a construction i.e. a specific set of weights W0, that results in an exponential increase of linear regions with the depth of the architectures.
  • ‘global’ activation patterns over the entire input space, and prove that for any fully connected network, with any number of hidden layers, the authors can upper bound the number of linear regions it can achieve, over all possible weight settings W .
  • (Tight) Upper Bound for Number of Activation Patterns Let A(n,k) denote a fully connected network with n hidden layers of width k, and inputs in Rm. the number of activation patterns A(FAn,k (Rm; W ) is upper bounded by O(kmn) for ReLU activations, and O((2k)mn) for hard tanh.
  • Bound on Growth of Trajectory Length Let FA(x , W ) be a ReLU or hard tanh random neural network and x(t) a one dimensional trajectory with x(t + δ) having a non trival perpendicular component to x(t) for all t, δ.
  • In Figures 4, 12, the authors see the growth of an input trajectory for ReLU networks on CIFAR-10 and MNIST.
  • The CIFAR10 network is convolutional but the authors observe that these layers result in similar rates of trajectory length increases to the fully connected layers.
  • The authors explore the insights gained from applying the measurements of expressivity, trajectory length, to understand network performance.
  • Note that even with a smaller weight initialization, weight norms increase during training, shown in Figure 9, pushing typically initialized networks into the exponential growth regime.
Conclusion
  • Interrelated set of expressivity measures; the authors have shown tight exponential bounds on the growth of these measures in the depth of the networks, and the authors have offered a unifying view of the analysis through the notion of trajectory length.
  • The authors' analysis of trajectories provides insights for the performance of trained networks as well, suggesting that networks in practice may be more sensitive to small perturbations in weights at lower layers.
  • There is a natural connection between adversarial examples, (Goodfellow et al, 2014), and trajectory length: adversarial perturbations are only a small distance away in input space, but result in a large change in classification.
Reference
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587): 484–489, 2016.
    Google ScholarLocate open access versionFindings
  • Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein. Deep knowledge tracing. In Advances in Neural Information Processing Systems, pages 505–513, 2015.
    Google ScholarLocate open access versionFindings
  • Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.
    Google ScholarLocate open access versionFindings
  • George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
    Google ScholarLocate open access versionFindings
  • Wolfgang Maass, Georg Schnitger, and Eduardo D Sontag. A comparison of the computational power of sigmoid and Boolean threshold circuits. Springer, 1994.
    Google ScholarFindings
  • Peter L Bartlett, Vitaly Maiorov, and Ron Meir. Almost linear vc-dimension bounds for piecewise polynomial networks. Neural computation, 10(8):2159–2173, 1998.
    Google ScholarLocate open access versionFindings
  • Razvan Pascanu, Guido Montufar, and Yoshua Bengio. On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv preprint arXiv:1312.6098, 2013.
    Findings
  • Guido F Montufar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. On the number of linear regions of deep neural networks. In Advances in neural information processing systems, pages 2924–2932, 2014.
    Google ScholarLocate open access versionFindings
  • Ronen Eldan and Ohad Shamir. The power of depth for feedforward neural networks. arXiv preprint arXiv:1512.03965, 2015.
    Findings
  • Matus Telgarsky. Representation benefits of deep feedforward networks. arXiv preprint arXiv:1509.08101, 2015.
    Findings
  • James Martens, Arkadev Chattopadhya, Toni Pitassi, and Richard Zemel. On the representational efficiency of restricted boltzmann machines. In Advances in Neural Information Processing Systems, pages 2877–2885, 2013.
    Google ScholarLocate open access versionFindings
  • Monica Bianchini and Franco Scarselli. On the complexity of neural network classifiers: A comparison between shallow and deep architectures. Neural Networks and Learning Systems, IEEE Transactions on, 25(8):1553– 1565, 2014.
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn