AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Learning Deep Features for Scene Recognition using Places Database.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), (2014): 487-495
- Understanding the world in a single glance is one of the most accomplished feats of the human brain: it takes only a few tens of milliseconds to recognize the category of an object or environment, emphasizing an important role of feedforward processing in visual recognition.
- Besides the exposure to a dense and rich variety of natural images, one important property of the primate brain is its hierarchical organization in layers of increasing processing complexity, an architecture that has inspired Convolutional Neural Networks or CNNs [2, 14]
- These architectures together with recent large databases (e.g., ImageNet ) have obtained astonishing performance on object classification tasks [12, 5, 20].
- The authors show that one of the reasons for this discrepancy is that the higher-level features learned by object-centric versus scene-centric CNNs are different: iconic images of objects do not contain the richness and diversity of visual information that pictures of scenes and environments provide for learning to recognize them
- Understanding the world in a single glance is one of the most accomplished feats of the human brain: it takes only a few tens of milliseconds to recognize the category of an object or environment, emphasizing an important role of feedforward processing in visual recognition
- The baseline performance reached by these networks on scene classification tasks is within the range of performance based on hand-designed features and sophisticated classifiers [24, 21, 4]
- We measured the relative densities and diversities between SUN, ImageNet and Places using Amazon Mechanical Turk. Both measures used the same experimental interface: workers were presented with different pairs of images and they had to select the pair that contained the most similar images
- We show the results of a linear SVM trained on ImageNet-Convolutional Neural Networks features of 5000 images per category in Places 205 and 50 images per category in SUN 205 respectively
- We introduce a new benchmark with millions of labeled images, the Places database, designed to represent places and scenes found in the real world
- We demonstrate that object-centric and scene-centric neural networks differ in their internal representations, by introducing a simple visualization of the receptive fields of Convolutional Neural Networks units
- The authors measured the relative densities and diversities between SUN, ImageNet and Places using AMT
- Both measures used the same experimental interface: workers were presented with different pairs of images and they had to select the pair that contained the most similar images.
- The authors show the results of a linear SVM trained on ImageNet-CNN features of 5000 images per category in Places 205 and 50 images per category in SUN 205 respectively.
- The top-5 error rate for the test set of the Places 205 is 18.9%, while the top-5 error rate for the test set of SUN 205 is 8.1%
- Deep convolutional neural networks are designed to benefit and learn from massive amounts of data.
- The authors introduce a new benchmark with millions of labeled images, the Places database, designed to represent places and scenes found in the real world.
- The authors demonstrate that object-centric and scene-centric neural networks differ in their internal representations, by introducing a simple visualization of the receptive fields of CNN units.
- The authors provide the state-of-the-art performance using the deep features on all the current scene benchmarks
- Table1: Classification accuracy on the test set of Places 205 and the test set of SUN 205
- Table2: Classification accuracy/precision on scene-centric databases and object-centric databases for the Places-CNN feature and ImageNet-CNN feature. The classifier in all the experiments is a linear SVM with the same parameters for the two features
- Table3: Classification accuracy/precision on various databases for Hybrid-CNN feature. The numbers in bold indicate the results outperform the ImageNet-CNN feature or Places-CNN feature
- This work is supported by the National Science Foundation under Grant No 1016862 to A.O, ONR MURI N000141010933 to A.T, as well as MIT Big Data Initiative at CSAIL, Google and Xerox Awards, a hardware donation from NVIDIA Corporation, to A.O and A.T., Intel and Google awards to J.X, and grant TIN2012-38187-C03-02 to A.L
- This work is also supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory, contract FA8650-12-C-7211 to A.T
- P. Agrawal, R. Girshick, and J. Malik. Analyzing the performance of multilayer neural networks for object recognition. In Proc. ECCV. 2014.
- Y. Bengio. Learning deep architectures for ai. Foundations and trends R in Machine Learning, 2009.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proc. CVPR, 2009.
- C. Doersch, A. Gupta, and A. A. Efros. Mid-level visual element discovery as discriminative mode seeking. In In Advances in Neural Information Processing Systems, 2013.
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. 2014.
- R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. 2008.
- L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 2007.
- G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. 2007.
- C. Heip, P. Herman, and K. Soetaert. Indices of diversity and evenness. Oceanis, 1998.
- Y. Jia. Caffe: An open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/, 2013.
- T. Konkle, T. F. Brady, G. A. Alvarez, and A. Oliva. Scene memory is more detailed than you think: the role of categories in visual long-term memory. Psych Science, 2010.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In In Advances in Neural Information Processing Systems, 2012.
- S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR, 2006.
- Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989.
- L.-J. Li and L. Fei-Fei. What, where and who? classifying events by scene and object recognition. In Proc. ICCV, 2007.
- A. Oliva. Scene perception (chapter 51). The New Visual Neurosciences, 2013.
- A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int’l Journal of Computer Vision, 2001.
- G. Patterson and J. Hays. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In Proc. CVPR, 2012.
- A. Quattoni and A. Torralba. Recognizing indoor scenes. In Proc. CVPR, 2009.
- A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. arXiv preprint arXiv:1403.6382, 2014.
- J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: Theory and practice. Int’l Journal of Computer Vision, 2013.
- E. H. Simpson. Measurement of diversity. Nature, 1949.
- A. Torralba and A. A. Efros. Unbiased look at dataset bias. In Proc. CVPR, 2011.
- J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Proc. CVPR, 2010.
- B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas, and L. Fei-Fei. Human action recognition by learning bases of action attributes and parts. In Proc. ICCV, 2011.