AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We showed how transferability is negatively affected by two distinct issues: optimization difficulties related to splitting networks in the middle of fragilely co-adapted layers and the specialization of higher layer features to the original task at the expense of performance on ...

How transferable are features in deep neural networks?

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), (2014): 3320-3328

被引用6262|浏览574
EI
下载 PDF 全文
引用
微博一下

摘要

Many deep neural networks trained on natural images exhibit a curious phenomenon in common on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must e...更多

代码

数据

简介
  • Modern deep neural networks exhibit a curious phenomenon: when trained on images, they all tend to learn first-layer features that resemble either Gabor filters or color blobs
  • The appearance of these filters is so common that obtaining anything else on a natural image dataset causes suspicion of poorly chosen hyperparameters or a software bug.
重点内容
  • Modern deep neural networks exhibit a curious phenomenon: when trained on images, they all tend to learn first-layer features that resemble either Gabor filters or color blobs. The appearance of these filters is so common that obtaining anything else on a natural image dataset causes suspicion of poorly chosen hyperparameters or a software bug
  • We have demonstrated a method for quantifying the transferability of features from each layer of a neural network, which reveals their generality or specificity
  • We showed how transferability is negatively affected by two distinct issues: optimization difficulties related to splitting networks in the middle of fragilely co-adapted layers and the specialization of higher layer features to the original task at the expense of performance on the target task
  • We quantified how the transferability gap grows as the distance between tasks increases, when transferring higher layers, but found that even features transferred from distant tasks are better than random weights
  • We found that initializing with transferred features can improve generalization performance even after substantial fine-tuning on a new task, which could be a generally useful technique for improving deep neural network performance
方法
  • Since Krizhevsky et al (2012) won the ImageNet 2012 competition, there has been much interest and work toward tweaking hyperparameters of large convolutional models.
  • In this study the authors aim not to maximize absolute performance, but rather to study transfer results on a well-known architecture.
  • Further details of the training setup are given in the supplementary material, and code and parameter files to reproduce these experiments are available at http://yosinski.com/transfer
结果
  • The authors performed three sets of experiments.
  • The main experiment has random A/B splits and is discussed in Section 4.1.
  • Section 4.2 presents an experiment with the man-made/natural split.
  • Section 4.3 describes an experiment with random weights.
  • 5: Transfer + fine-tuning improves generalization.
  • 3: Fine-tuning recovers co-adapted interactions.
  • 4.1 Similar Datasets: Random A/B splits.
  • The results of all A/B transfer learning experiments on randomly split datasets are shown3 in Figure 2.
  • In each of the following interpretations, the authors compare the performance to the base case
结论
  • The authors have demonstrated a method for quantifying the transferability of features from each layer of a neural network, which reveals their generality or specificity.
  • The authors showed how transferability is negatively affected by two distinct issues: optimization difficulties related to splitting networks in the middle of fragilely co-adapted layers and the specialization of higher layer features to the original task at the expense of performance on the target task.
  • The authors found that initializing with transferred features can improve generalization performance even after substantial fine-tuning on a new task, which could be a generally useful technique for improving deep neural network performance
表格
  • Table1: Performance boost of AnB+ over controls, averaged over different ranges of layers
Download tables as Excel
基金
  • The authors would like to thank Kyunghyun Cho and Thomas Fuchs for helpful discussions, Joost Huizinga, Anh Nguyen, and Roby Velez for editing, as well as funding from the NASA Space Technology Research Fellowship (JY), DARPA project W911NF-12-1-0449, NSERC, Ubisoft, and CIFAR (YB is a CIFAR Fellow)
引用论文
  • Bengio, Y. (2011). Deep learning of representations for unsupervised and transfer learning. In JMLR W&CP: Proc. Unsupervised and Transfer Learning.
    Google ScholarLocate open access versionFindings
  • Bengio, Y., Bastien, F., Bergeron, A., Boulanger-Lewandowski, N., Breuel, T., Chherawala, Y., Cisse, M., Cote, M., Erhan, D., Eustache, J., Glorot, X., Muller, X., Pannetier Lebeuf, S., Pascanu, R., Rifai, S., Savard, F., and Sicard, G. (2011). Deep learners benefit more from out-of-distribution examples. In JMLR W&CP: Proc. AISTATS’2011.
    Google ScholarLocate open access versionFindings
  • Caruana, R. (1995). Learning many related tasks at the same time with backpropagation. pages 657–664, Cambridge, MA. MIT Press.
    Google ScholarFindings
  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.
    Google ScholarFindings
  • Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. (2013a). Decaf: A deep convolutional activation feature for generic visual recognition. Technical report, arXiv preprint arXiv:1310.1531.
    Findings
  • Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. (2013b). Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531.
    Findings
  • Fei-Fei, L., Fergus, R., and Perona, P. (2004). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In Conference on Computer Vision and Pattern Recognition Workshop (CVPR 2004), page 178.
    Google ScholarLocate open access versionFindings
  • Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2013). Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524.
    Findings
  • Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In Proc. International Conference on Computer Vision (ICCV’09), pages 2146–2153. IEEE.
    Google ScholarLocate open access versionFindings
  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.
    Findings
  • Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS’2012).
    Google ScholarLocate open access versionFindings
  • Le, Q. V., Karpenko, A., Ngiam, J., and Ng, A. Y. (2011). ICA with reconstruction cost for efficient overcomplete feature learning. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 1017–1025.
    Google ScholarLocate open access versionFindings
  • Lee, H., Grosse, R., Ranganath, R., and Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. Montreal, Canada.
    Google ScholarFindings
  • Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In International Conference on Learning Representations (ICLR 2014). CBLS.
    Google ScholarLocate open access versionFindings
  • Zeiler, M. D. and Fergus, R. (2013). Visualizing and understanding convolutional networks. Technical Report Arxiv 1311.2901.
    Findings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科