AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We demonstrate that object-centric and scene-centric neural networks differ in their internal representations, by introducing a simple visualization of the receptive fields of Convolutional Neural Networks units

Learning Deep Features for Scene Recognition using Places Database.

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), (2014): 487-495

Cited by: 2923|Views217
EI
Full Text
Bibtex
Weibo

Abstract

Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, per...More

Code:

Data:

0
Introduction
  • Understanding the world in a single glance is one of the most accomplished feats of the human brain: it takes only a few tens of milliseconds to recognize the category of an object or environment, emphasizing an important role of feedforward processing in visual recognition.
  • Besides the exposure to a dense and rich variety of natural images, one important property of the primate brain is its hierarchical organization in layers of increasing processing complexity, an architecture that has inspired Convolutional Neural Networks or CNNs [2, 14]
  • These architectures together with recent large databases (e.g., ImageNet [3]) have obtained astonishing performance on object classification tasks [12, 5, 20].
  • The authors show that one of the reasons for this discrepancy is that the higher-level features learned by object-centric versus scene-centric CNNs are different: iconic images of objects do not contain the richness and diversity of visual information that pictures of scenes and environments provide for learning to recognize them
Highlights
  • Understanding the world in a single glance is one of the most accomplished feats of the human brain: it takes only a few tens of milliseconds to recognize the category of an object or environment, emphasizing an important role of feedforward processing in visual recognition
  • The baseline performance reached by these networks on scene classification tasks is within the range of performance based on hand-designed features and sophisticated classifiers [24, 21, 4]
  • We measured the relative densities and diversities between SUN, ImageNet and Places using Amazon Mechanical Turk. Both measures used the same experimental interface: workers were presented with different pairs of images and they had to select the pair that contained the most similar images
  • We show the results of a linear SVM trained on ImageNet-Convolutional Neural Networks features of 5000 images per category in Places 205 and 50 images per category in SUN 205 respectively
  • We introduce a new benchmark with millions of labeled images, the Places database, designed to represent places and scenes found in the real world
  • We demonstrate that object-centric and scene-centric neural networks differ in their internal representations, by introducing a simple visualization of the receptive fields of Convolutional Neural Networks units
Results
  • The authors measured the relative densities and diversities between SUN, ImageNet and Places using AMT
  • Both measures used the same experimental interface: workers were presented with different pairs of images and they had to select the pair that contained the most similar images.
  • The authors show the results of a linear SVM trained on ImageNet-CNN features of 5000 images per category in Places 205 and 50 images per category in SUN 205 respectively.
  • The top-5 error rate for the test set of the Places 205 is 18.9%, while the top-5 error rate for the test set of SUN 205 is 8.1%
Conclusion
  • Deep convolutional neural networks are designed to benefit and learn from massive amounts of data.
  • The authors introduce a new benchmark with millions of labeled images, the Places database, designed to represent places and scenes found in the real world.
  • The authors demonstrate that object-centric and scene-centric neural networks differ in their internal representations, by introducing a simple visualization of the receptive fields of CNN units.
  • The authors provide the state-of-the-art performance using the deep features on all the current scene benchmarks
Tables
  • Table1: Classification accuracy on the test set of Places 205 and the test set of SUN 205
  • Table2: Classification accuracy/precision on scene-centric databases and object-centric databases for the Places-CNN feature and ImageNet-CNN feature. The classifier in all the experiments is a linear SVM with the same parameters for the two features
  • Table3: Classification accuracy/precision on various databases for Hybrid-CNN feature. The numbers in bold indicate the results outperform the ImageNet-CNN feature or Places-CNN feature
Download tables as Excel
Funding
  • This work is supported by the National Science Foundation under Grant No 1016862 to A.O, ONR MURI N000141010933 to A.T, as well as MIT Big Data Initiative at CSAIL, Google and Xerox Awards, a hardware donation from NVIDIA Corporation, to A.O and A.T., Intel and Google awards to J.X, and grant TIN2012-38187-C03-02 to A.L
  • This work is also supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory, contract FA8650-12-C-7211 to A.T
Reference
  • P. Agrawal, R. Girshick, and J. Malik. Analyzing the performance of multilayer neural networks for object recognition. In Proc. ECCV. 2014.
    Google ScholarLocate open access versionFindings
  • Y. Bengio. Learning deep architectures for ai. Foundations and trends R in Machine Learning, 2009.
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proc. CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • C. Doersch, A. Gupta, and A. A. Efros. Mid-level visual element discovery as discriminative mode seeking. In In Advances in Neural Information Processing Systems, 2013.
    Google ScholarLocate open access versionFindings
  • J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. 2014.
    Google ScholarFindings
  • R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. 2008.
    Google ScholarFindings
  • L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 2007.
    Google ScholarLocate open access versionFindings
  • G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. 2007.
    Google ScholarFindings
  • C. Heip, P. Herman, and K. Soetaert. Indices of diversity and evenness. Oceanis, 1998.
    Google ScholarLocate open access versionFindings
  • Y. Jia. Caffe: An open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/, 2013.
    Findings
  • T. Konkle, T. F. Brady, G. A. Alvarez, and A. Oliva. Scene memory is more detailed than you think: the role of categories in visual long-term memory. Psych Science, 2010.
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In In Advances in Neural Information Processing Systems, 2012.
    Google ScholarLocate open access versionFindings
  • S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR, 2006.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989.
    Google ScholarFindings
  • L.-J. Li and L. Fei-Fei. What, where and who? classifying events by scene and object recognition. In Proc. ICCV, 2007.
    Google ScholarLocate open access versionFindings
  • A. Oliva. Scene perception (chapter 51). The New Visual Neurosciences, 2013.
    Google ScholarLocate open access versionFindings
  • A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int’l Journal of Computer Vision, 2001.
    Google ScholarLocate open access versionFindings
  • G. Patterson and J. Hays. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In Proc. CVPR, 2012.
    Google ScholarLocate open access versionFindings
  • A. Quattoni and A. Torralba. Recognizing indoor scenes. In Proc. CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. arXiv preprint arXiv:1403.6382, 2014.
    Findings
  • J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: Theory and practice. Int’l Journal of Computer Vision, 2013.
    Google ScholarLocate open access versionFindings
  • E. H. Simpson. Measurement of diversity. Nature, 1949.
    Google ScholarLocate open access versionFindings
  • A. Torralba and A. A. Efros. Unbiased look at dataset bias. In Proc. CVPR, 2011.
    Google ScholarLocate open access versionFindings
  • J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Proc. CVPR, 2010.
    Google ScholarLocate open access versionFindings
  • B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas, and L. Fei-Fei. Human action recognition by learning bases of action attributes and parts. In Proc. ICCV, 2011.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科