Intriguing properties of neural networks

international conference on learning representations, 2014.

Cited by: 5539|Bibtex|Views473
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We demonstrated that deep neural networks have counter-intuitive properties both with respect to the semantic meaning of individual units and with respect to their discontinuities

Abstract:

Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such pr...More

Code:

Data:

0
Introduction
  • Deep neural networks are powerful learning models that achieve excellent performance on visual and speech recognition problems [9, 8].
  • The first property is concerned with the semantic meaning of individual units.
  • The authors show in section 3 that random projections of φ(x) are semantically indistinguishable from the coordinates of φ(x)
  • This puts into question the conjecture that neural networks disentangle variation factors across coordinates.
  • It seems that it is the entire space of activations, rather than the individual units, that contains the bulk of the semantic information.
  • The vector representations are stable up to a rotation of the space, so the individual units of the vector representations are unlikely to contain semantic information
Highlights
  • Deep neural networks are powerful learning models that achieve excellent performance on visual and speech recognition problems [9, 8]
  • The first property is concerned with the semantic meaning of individual units
  • Our preliminary experiments have yielded positive evidence on MNIST to support this hypothesis as well: We have successfully trained a two layer 100-100-10 non-convolutional neural network with a test error below 1.2% by keeping a pool of adversarial examples a random subset of which is continuously replaced by newly generated adversarial examples and which is mixed into (a) the original training set all the time
  • We demonstrated that deep neural networks have counter-intuitive properties both with respect to the semantic meaning of individual units and with respect to their discontinuities
  • If the network can generalize well, how can it be confused by these adversarial negatives, which are indistinguishable from the regular examples? Possible explanation is that the set of adversarial negatives is of extremely low probability, and is never observed in the test set, yet it is dense, and so it is found near every virtually every test case
  • We don’t have a deep understanding of how often adversarial negatives appears, and this issue should be addressed in a future research
Results
  • Our “minimimum distortion” function D has the following intriguing properties which the authors will support by informal evidence and quantitative experiments : 1.
  • The above observations suggest that adversarial examples are somewhat universal and not just the results of overfitting to a particular model or to the specific selection of the training set.
  • They suggest that back-feeding adversarial examples to training might improve generalization of the resulting models.
  • The authors' preliminary experiments have yielded positive evidence on MNIST to support this hypothesis as well: The authors have successfully trained a two layer 100-100-10 non-convolutional neural network with a test error below 1.2% by keeping a pool of adversarial examples a random subset of which is continuously replaced by newly generated adversarial examples and which is mixed into (a)
Conclusion
  • The authors demonstrated that deep neural networks have counter-intuitive properties both with respect to the semantic meaning of individual units and with respect to their discontinuities.
  • The existence of the adversarial negatives appears to be in contradiction with the network’s ability to achieve high generalization performance.
  • If the network can generalize well, how can it be confused by these adversarial negatives, which are indistinguishable from the regular examples?
  • The authors don’t have a deep understanding of how often adversarial negatives appears, and this issue should be addressed in a future research
Summary
  • Introduction:

    Deep neural networks are powerful learning models that achieve excellent performance on visual and speech recognition problems [9, 8].
  • The first property is concerned with the semantic meaning of individual units.
  • The authors show in section 3 that random projections of φ(x) are semantically indistinguishable from the coordinates of φ(x)
  • This puts into question the conjecture that neural networks disentangle variation factors across coordinates.
  • It seems that it is the entire space of activations, rather than the individual units, that contains the bulk of the semantic information.
  • The vector representations are stable up to a rotation of the space, so the individual units of the vector representations are unlikely to contain semantic information
  • Results:

    Our “minimimum distortion” function D has the following intriguing properties which the authors will support by informal evidence and quantitative experiments : 1.
  • The above observations suggest that adversarial examples are somewhat universal and not just the results of overfitting to a particular model or to the specific selection of the training set.
  • They suggest that back-feeding adversarial examples to training might improve generalization of the resulting models.
  • The authors' preliminary experiments have yielded positive evidence on MNIST to support this hypothesis as well: The authors have successfully trained a two layer 100-100-10 non-convolutional neural network with a test error below 1.2% by keeping a pool of adversarial examples a random subset of which is continuously replaced by newly generated adversarial examples and which is mixed into (a)
  • Conclusion:

    The authors demonstrated that deep neural networks have counter-intuitive properties both with respect to the semantic meaning of individual units and with respect to their discontinuities.
  • The existence of the adversarial negatives appears to be in contradiction with the network’s ability to achieve high generalization performance.
  • If the network can generalize well, how can it be confused by these adversarial negatives, which are indistinguishable from the regular examples?
  • The authors don’t have a deep understanding of how often adversarial negatives appears, and this issue should be addressed in a future research
Tables
  • Table1: Tests of the generalization of adversarial instances on MNIST
  • Table2: Cross-model generalization of adversarial examples. The columns of the Tables show the error induced by distorted examples fed to the given model. The last column shows average distortion wrt. original training set
  • Table3: Models trained to study cross-training-set generalization of the generated adversarial examples. Errors presented in Table correpond to original not-distorted data, to provide a baseline
  • Table4: Cross-training-set generalization error rate for the set of adversarial examples generated for different models. The error induced by a random distortion to the same examples is displayed in the last row
  • Table5: Frame Bounds of each rectified layer of the network from [<a class="ref-link" id="c9" href="#r9">9</a>]
Download tables as Excel
Reference
  • David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and KlausRobert Muller. How to explain individual classification decisions. The Journal of Machine Learning Research, 99:1803–1831, 2010.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio. Learning deep architectures for ai. Foundations and trends® in Machine Learning, 2(1):1–127, 2009.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. Technical Report 1341, University of Montreal, June 2009. Also presented at the ICML 2009 Workshop on Learning Feature Hierarchies, Montreal, Canada.
    Google ScholarFindings
  • Pedro Felzenszwalb, David McAllester, and Deva Ramanan. A discriminatively trained, multiscale, deformable part model. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
    Google ScholarLocate open access versionFindings
  • Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524, 2013.
    Findings
  • Ian Goodfellow, Quoc Le, Andrew Saxe, Honglak Lee, and Andrew Y Ng. Measuring invariances in deep networks. Advances in neural information processing systems, 22:646–654, 2009.
    Google ScholarLocate open access versionFindings
  • Geoffrey E. Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6):82–97, 2012.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106–1114, 2012.
    Google ScholarLocate open access versionFindings
  • Quoc V Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S Corrado, Jeff Dean, and Andrew Y Ng. Building high-level features using large scale unsupervised learning. arXiv preprint arXiv:1112.6209, 2011.
    Findings
  • Yann LeCun and Corinna Cortes. The mnist database of handwritten digits, 1998.
    Google ScholarFindings
  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
    Findings
  • Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional neural networks. arXiv preprint arXiv:1311.2901, 20
    Findings
Full Text
Your rating :
0

 

Tags
Comments