AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We demonstrate that the class activation maps localization technique generalizes to other visual recognition tasks i.e., our technique produces generic localizable deep features that can aid other researchers in understanding the basis of discrimination used by convolutional neur...

Learning Deep Features for Discriminative Localization

CVPR, (2016)

被引用4484|浏览372
EI
下载 PDF 全文
引用
微博一下

摘要

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that i...更多

代码

数据

0
简介
  • Recent work by Zhou et al [34] has shown that the convolutional units of various layers of convolutional neural networks (CNNs) behave as object detectors despite no supervision on the location of the object was provided.
  • Despite having this remarkable ability to localize objects in the convolutional layers, this ability is lost when fully-connected layers are used for classification.
  • This tweaking allows identifying the discriminative image regions in a single forward-
重点内容
  • Recent work by Zhou et al [34] has shown that the convolutional units of various layers of convolutional neural networks (CNNs) behave as object detectors despite no supervision on the location of the object was provided
  • We find that in most cases there is a small performance drop of 1 − 2% when removing the additional layers from the various networks
  • In this work we propose a general technique called Class Activation Mapping (CAM) for convolutional neural network with global average pooling
  • Class activation maps allow us to visualize the predicted class scores on any given image, highlighting the discriminative object parts detected by the convolutional neural network
  • We evaluate our approach on weakly supervised object localization on the ILSVRC benchmark, demonstrating that our global average pooling convolutional neural network can perform accurate object localization
  • Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation.We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1
  • We demonstrate that the class activation maps localization technique generalizes to other visual recognition tasks i.e., our technique produces generic localizable deep features that can aid other researchers in understanding the basis of discrimination used by convolutional neural network for their tasks
方法
  • GoogLeNet-GAP on full image GoogLeNet-GAP on crop GoogLeNet-GAP on BBox Alignments [7] Alignments [7] DPD [32] DeCAF+DPD [3] PANDA R-CNN [31].
  • Given a set of images containing a common concept, the authors want to identify which regions the network recognizes as being important and if this corresponds to the input pattern.
  • The authors follow a similar approach as before: the authors train a linear SVM on fc7 from AlexNet ave pool from GoogLeNet gap from GoogLeNet-GAP.
结果
  • The authors first report results on object classification to demonstrate that the approach does not significantly hurt classification performance.
  • The authors add two convolutional layers just before GAP resulting in the AlexNet*-GAP network.
  • Note that it is important for the networks to perform well on classification in order to achieve a high performance on localization as it involves identifying both the object category and the bounding box location accurately
结论
  • In this work the authors propose a general technique called Class Activation Mapping (CAM) for CNNs with global average pooling.
  • This enables classification-trained CNNs to learn to perform object localization, without using any bounding box annotations.
  • The authors demonstrate that the CAM localization technique generalizes to other visual recognition tasks i.e., the technique produces generic localizable deep features that can aid other researchers in understanding the basis of discrimination used by CNNs for their tasks
表格
  • Table1: Classification error on the ILSVRC validation set
  • Table2: Localization error on the ILSVRC validation set. Backprop refers to using [<a class="ref-link" id="c23" href="#r23">23</a>] for localization instead of CAM
  • Table3: Localization error on the ILSVRC test set for various weakly- and fully- supervised methods
  • Table4: Fine-grained classification performance on CUB200 dataset. GoogLeNet-GAP can successfully localize important image crops, boosting classification performance
  • Table5: Classification accuracy on representative scene and object datasets for different deep features
Download tables as Excel
相关工作
  • Convolutional Neural Networks (CNNs) have led to impressive performance on a variety of visual recognition tasks [10, 35, 8]. Recent work has shown that despite being trained on image-level labels, CNNs have the remarkable ability to localize objects [1, 16, 2, 15, 18]. In this work, we show that, using an appropriate architecture, we can generalize this ability beyond just localizing objects, to start identifying exactly which regions of an image are being used for discrimination. Here, we discuss the two lines of work most related to this paper: weakly-supervised object localization and visualizing the internal representation of CNNs.

    Weakly-supervised object localization: There have been a number of recent works exploring weaklysupervised object localization using CNNs [1, 16, 2, 15]. Bergamo et al [1] propose a technique for self-taught object localization involving masking out image regions to identify the regions causing the maximal activations in order to localize objects. Cinbis et al [2] and Pinheiro et al [18] combine multiple-instance learning with CNN features to localize objects. Oquab et al [15] propose a method for transferring mid-level image representations and show that some object localization can be achieved by evaluating the output of CNNs on multiple overlapping patches. However, the authors do not actually evaluate the localization ability. On the other hand, while these approaches yield promising results, they are not trained end-to-end and require multiple forward passes of a network to localize objects, making them difficult to scale to real-world datasets. Our approach is trained end-to-end and can localize objects in a single forward pass.
基金
  • This work was supported by NSF grant IIS-1524817, and by a Google faculty research award to A.T
引用论文
  • A. Bergamo, L. Bazzani, D. Anguelov, and L. Torresani. Self-taught object localization with deep networks. arXiv preprint arXiv:1409.3964, 2014. 1, 2
    Findings
  • R. G. Cinbis, J. Verbeek, and C. Schmid. Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2015. 1, 2
    Google ScholarLocate open access versionFindings
  • J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. International Conference on Machine Learning, 2014. 5, 6
    Google ScholarLocate open access versionFindings
  • A. Dosovitskiy and T. Brox. Inverting convolutional networks with convolutional networks. arXiv preprint arXiv:1506.02753, 2015. 2
    Findings
  • R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 2008. 5
    Google ScholarLocate open access versionFindings
  • L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 2007. 5
    Google ScholarLocate open access versionFindings
  • E. Gavves, B. Fernando, C. G. Snoek, A. W. Smeulders, and T. Tuytelaars. Local alignments for fine-grained categorization. Int’l Journal of Computer Vision, 2014. 6
    Google ScholarLocate open access versionFindings
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc. CVPR, 2014. 1
    Google ScholarLocate open access versionFindings
  • G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. 2007. 5
    Google ScholarFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012. 1, 4
    Google ScholarLocate open access versionFindings
  • S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Proc. CVPR, 2006. 5
    Google ScholarLocate open access versionFindings
  • L.-J. Li and L. Fei-Fei. What, where and who? classifying events by scene and object recognition. Proc. ICCV, 2007. 5
    Google ScholarLocate open access versionFindings
  • M. Lin, Q. Chen, and S. Yan. Network in network. International Conference on Learning Representations, 2014. 1, 2, 4
    Google ScholarLocate open access versionFindings
  • A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. Proc. CVPR, 2015. 2
    Google ScholarLocate open access versionFindings
  • M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. Proc. CVPR, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Is object localization for free? weakly-supervised learning with convolutional neural networks. Proc. CVPR, 2015. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • G. Patterson and J. Hays. Sun attribute database: Discovering, annotating, and recognizing scene attributes. Proc. CVPR, 2012. 5
    Google ScholarLocate open access versionFindings
  • P. O. Pinheiro and R. Collobert. From image-level to pixellevel labeling with convolutional networks. 2015. 1, 2
    Google ScholarFindings
  • A. Quattoni and A. Torralba. Recognizing indoor scenes. Proc. CVPR, 2009. 5
    Google ScholarLocate open access versionFindings
  • A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. arXiv preprint arXiv:1403.6382, 2014. 5
    Findings
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. In Int’l Journal of Computer Vision, 2015. 1, 3, 4
    Google ScholarLocate open access versionFindings
  • P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229, 2013. 5
    Findings
  • K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. International Conference on Learning Representations Workshop, 2014. 4, 5
    Google ScholarLocate open access versionFindings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations, 2015. 4
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014. 1, 2, 4, 5
    Findings
  • K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. Proc. ICCV, 2011. 7
    Google ScholarLocate open access versionFindings
  • P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-UCSD Birds 200. Technical report, California Institute of Technology, 2010. 5
    Google ScholarFindings
  • J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. Proc. CVPR, 2010. 5, 7
    Google ScholarLocate open access versionFindings
  • B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas, and L. FeiFei. Human action recognition by learning bases of action attributes and parts. Proc. ICCV, 2011. 5
    Google ScholarLocate open access versionFindings
  • M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. Proc. ECCV, 2014. 2, 3
    Google ScholarLocate open access versionFindings
  • N. Zhang, J. Donahue, R. Girshick, and T. Darrell. Partbased r-cnns for fine-grained category detection. Proc. ECCV, 2014. 6
    Google ScholarLocate open access versionFindings
  • N. Zhang, R. Farrell, F. Iandola, and T. Darrell. Deformable part descriptors for fine-grained recognition and attribute prediction. Proc. ICCV, 2013. 6
    Google ScholarLocate open access versionFindings
  • B. Zhou, V. Jagadeesh, and R. Piramuthu. Conceptlearner: Discovering visual concepts from weakly labeled image collections. Proc. CVPR, 2015. 7
    Google ScholarLocate open access versionFindings
  • B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Object detectors emerge in deep scene cnns. International Conference on Learning Representations, 2015. 1, 2, 3, 8
    Google ScholarLocate open access versionFindings
  • B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems, 2014. 1, 5
    Google ScholarLocate open access versionFindings
  • B. Zhou, Y. Tian, S. Sukhbaatar, A. Szlam, and R. Fergus. Simple baseline for visual question answering. arXiv preprint arXiv:1512.02167, 2015. 7
    Findings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科