AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose to restore the negative information in the few-shot object detection: we show that hard negatives are essential for the metric learning in few-shot object detection

Restoring Negative Information in Few-Shot Object Detection

NIPS 2020, (2020)

Cited by: 0|Views39
EI
Full Text
Bibtex
Weibo

Abstract

Few-shot learning has recently emerged as a new challenge in the deep learning field: unlike conventional methods that train the deep neural networks (DNNs) with a large number of labeled data, it asks for the generalization of DNNs on new classes with few annotated samples. Recent advances in few-shot learning mainly focus on image cla...More
0
Introduction
  • There has been a transformative revolution in computer vision cultivated by the adoption of deep learning [2].
  • Humans, even children can recognize a multitude of objects in images when told only once or few times, despite the fact that the image of objects may vary in different viewpoints, sizes and scales.
  • This ability, is still a challenge for machine perception.
  • Recent advances in few-shot learning mainly focus on image classification and recognition tasks [3,4,5,6,7,8,9]
Highlights
  • In the past decade, there has been a transformative revolution in computer vision cultivated by the adoption of deep learning [2]
  • Recent advances in few-shot learning mainly focus on image classification and recognition tasks [3,4,5,6,7,8,9]
  • In particular with the 1-shot scenario where the support for each class is very limited, our method provides an efficient way to mine useful negative formation within the support image, and we improve representative-based metric learning approach (RepMet) up to 11.6%! The margin of improvement gets smaller with 10-shot as the support set becomes more diverse
  • We propose to restore the negative information in the few-shot object detection: we show that hard negatives are essential for the metric learning in few-shot object detection
  • We build our work on top of a state-of-the-art pipeline, RepMet, where we introduce several new modules such as negative and positive representatives, NP-embedding, triplet losses based on NP-embedding, etc
  • Results show that our method significantly improves the SOTA by restoring negative information into it
Methods
  • RepMet. Some core modules of RepMet [1] are illustrated in Figure.
  • 2 with light green background.
  • Some core modules of RepMet [1] are illustrated in Figure.
  • It learns positive class representatives {Ripj|1 ≤ i ≤ N, 1 ≤ j ≤ K} as weights of an FC layer of size N · K · e, where i and j denote the i-th class and j-th representative.
  • The distances are optimized with 1) a cross entropy loss to predict the correct class label; 2) an embedding loss to enforce a margin between the distance of Ep to the closest representative of the correct class and the closest representative of a wrong class
Results
  • Results on ImageNet

    LOC

    Comparison with RepMet and other baselines. The authors follow the same setup with RepMet to report NP-RepMet with 1-shot, 5-shot and 10-shot in Table 1-Left.
  • There are several baselines worth of comparison to NP-RepMet: for instance, the authors can train a standard object detector on base classes using the same FPN-DCN backbone and fine-tune its classifier head on novel classes
  • This is denoted as ‘baseline-FT’ in [1] and Table 1: the reported results are 35.0, 51.0 and 59.7 in 1, 5 and 10-shot, respectively.
  • Results in Table 4 show that the NP-RepMet delivers a good detection performance on base classes as well
Conclusion
  • In the regime of few-shot learning, few-shot object detection has not been largely explored.
  • The authors propose to restore the negative information in the few-shot object detection: the authors show that hard negatives are essential for the metric learning in few-shot object detection.
  • The authors build the work on top of a state-of-the-art pipeline, RepMet, where the authors introduce several new modules such as negative and positive representatives, NP-embedding, triplet losses based on NP-embedding, etc.
  • A new inference scheme is introduced given the learnt negative representatives in the pipeline.
  • Results show that the method significantly improves the SOTA by restoring negative information into it
Summary
  • Introduction:

    There has been a transformative revolution in computer vision cultivated by the adoption of deep learning [2].
  • Humans, even children can recognize a multitude of objects in images when told only once or few times, despite the fact that the image of objects may vary in different viewpoints, sizes and scales.
  • This ability, is still a challenge for machine perception.
  • Recent advances in few-shot learning mainly focus on image classification and recognition tasks [3,4,5,6,7,8,9]
  • Objectives:

    Building upon the above observation, the purpose of this study is to restore the negative information properly in few-shot object detection.
  • Methods:

    RepMet. Some core modules of RepMet [1] are illustrated in Figure.
  • 2 with light green background.
  • Some core modules of RepMet [1] are illustrated in Figure.
  • It learns positive class representatives {Ripj|1 ≤ i ≤ N, 1 ≤ j ≤ K} as weights of an FC layer of size N · K · e, where i and j denote the i-th class and j-th representative.
  • The distances are optimized with 1) a cross entropy loss to predict the correct class label; 2) an embedding loss to enforce a margin between the distance of Ep to the closest representative of the correct class and the closest representative of a wrong class
  • Results:

    Results on ImageNet

    LOC

    Comparison with RepMet and other baselines. The authors follow the same setup with RepMet to report NP-RepMet with 1-shot, 5-shot and 10-shot in Table 1-Left.
  • There are several baselines worth of comparison to NP-RepMet: for instance, the authors can train a standard object detector on base classes using the same FPN-DCN backbone and fine-tune its classifier head on novel classes
  • This is denoted as ‘baseline-FT’ in [1] and Table 1: the reported results are 35.0, 51.0 and 59.7 in 1, 5 and 10-shot, respectively.
  • Results in Table 4 show that the NP-RepMet delivers a good detection performance on base classes as well
  • Conclusion:

    In the regime of few-shot learning, few-shot object detection has not been largely explored.
  • The authors propose to restore the negative information in the few-shot object detection: the authors show that hard negatives are essential for the metric learning in few-shot object detection.
  • The authors build the work on top of a state-of-the-art pipeline, RepMet, where the authors introduce several new modules such as negative and positive representatives, NP-embedding, triplet losses based on NP-embedding, etc.
  • A new inference scheme is introduced given the learnt negative representatives in the pipeline.
  • Results show that the method significantly improves the SOTA by restoring negative information into it
Tables
  • Table1: Results on ImageNet-LOC. Left: comparison with RepMet and baseline-FT in 1, 5 and 10-shot detection. Right: ablation study of NP-embedding (top) and NP-inference (bottom) in 1-shot detection
  • Table2: Results on ImageNet-LOC 1-shot setting. Left: ablation negative proposal selection at inference. Right: parameter variations of β (top) and IoU (bottom) for hard negatives
  • Table3: Performance on PASCAL VOC 2007 novel classes
Download tables as Excel
Related work
  • Few-shot learning. Few shot learning is not a new problem: its target is to recognize previously unseen classes with very few labeled samples [17,18,19,20,21,22]. The recent resurgence in interest of few-shot learning is through the so-called meta-learning [23,24,25, 20, 4], where meta-learning and meta-testing are performed in a similar manner; representative works in image classification include matching network [4] and prototypical network [3]. Apart from meta-learning, some other approaches make use of sample synthesis and augmentation in few-shot learning [26, 5, 27, 28].

    Few-shot object detection. In contrast to classification, few-shot object detection is not largely explored. Karlinsky et al [1] introduce an end-to-end representative-based metric learning approach (RepMet) for few-shot detection; Kang et al [10] present a new model using a meta feature learner and a re-weighting module to fast adjust contributions of the basic features to the detection of new classes. Fan et al [13] extend the matching network by learning on image pairs based on the Faster R-CNN framework, which is equipped with multi-scale and shaped attentions. Some other works modelling the meta-knowledge based on Faster R-CNN can be found in [12, 11]. These approaches fall within the meta-learning regime. Whilst there exist many other works trying to solve the problem from the domain transfer/adaption perspective [29, 30]. For instance, Chen et al [29] propose a low-shot transfer detector (LSTD) to leverage rich source-domain knowledge to construct a targetdomain detector with few training examples. Transfer learning in [29, 30] requires training on both source (base) and target (new) classes. Meta-learning instead can be more efficient in the sense its predication on new classes can be directly achieved via network inference. In this paper, we focus on the meta-learning.
Funding
  • Acknowledgments and Disclosure of Funding This work was partially supported by the National Natural Science Foundation of China (NSFC) under Grant No 61828602; the Grant from the Institute for Guo Qiang of Tsinghua University and Beijing Academy of Artificial Intelligence (BAAI); the Science and Technology Major Project of Guangzhou (202007030006); the open project of Zhejiang Laboratory
Reference
  • Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, and Alex M Bronstein. Repmet: Representative-based metric learning for classification and few-shot object detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Bharath Hariharan and Ross Girshick. Low-shot visual recognition by shrinking and hallucinating features. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Yann Lifchitz, Yannis Avrithis, Sylvaine Picard, and Andrei Bursuc. Dense classification and implanting for few-shot learning. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Aoxue Li, Tiange Luo, Tao Xiang, Weiran Huang, and Liwei Wang. Few-shot learning with global class representations. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Zhimao Peng, Zechao Li, Junge Zhang, Yan Li, Guo-Jun Qi, and Jinhui Tang. Few-shot image recognition with knowledge transfer. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Pavel Tokmakov, Yu-Xiong Wang, and Martial Hebert. Learning compositional representations for few-shot recognition. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. Few-shot object detection via feature reweighting. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, and Liang Lin. Meta r-cnn: Towards general solver for instance-level low-shot learning. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. Meta-learning to detect rare objects. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Qi Fan, Wei Zhuo, and Yu-Wing Tai. Few-shot object detection with attention-rpn and multirelation detector. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • Ross Girshick. Fast r-cnn. In ICCV, 2015.
    Google ScholarLocate open access versionFindings
  • Fangyun Wei, Xiao Sun, Hongyang Li, Jingdong Wang, and Stephen Lin. Point-set anchors for object detection, instance segmentation and pose estimation. ECCV, 2020.
    Google ScholarLocate open access versionFindings
  • Li Fei-Fei, Rob Fergus, and Pietro Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594–611, 2006.
    Google ScholarLocate open access versionFindings
  • Brenden M Lake, Russ R Salakhutdinov, and Josh Tenenbaum. One-shot learning by inverting a compositional causal process. In NIPS, 2013.
    Google ScholarFindings
  • Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
    Google ScholarLocate open access versionFindings
  • Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Meta-learning with memory-augmented neural networks. In ICML, 2016.
    Google ScholarLocate open access versionFindings
  • Jason Weston, Sumit Chopra, and Antoine Bordes. Memory networks. arXiv preprint arXiv:1410.3916, 2014.
    Findings
  • Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
    Findings
  • Luca Bertinetto, João F Henriques, Jack Valmadre, Philip Torr, and Andrea Vedaldi. Learning feed-forward one-shot learners. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML workshop, 2015.
    Google ScholarLocate open access versionFindings
  • Zitian Chen, Yanwei Fu, Yinda Zhang, Yu-Gang Jiang, Xiangyang Xue, and Leonid Sigal. Semantic feature augmentation in few-shot learning. arXiv preprint arXiv:1804.05298, 2018.
    Findings
  • Eli Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Abhishek Kumar, Rogerio Feris, Raja Giryes, and Alex Bronstein. Delta-encoder: an effective sample synthesis method for few-shot object recognition. In NIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. Low-shot learning from imaginary data. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Hao Chen, Yali Wang, Guoyou Wang, and Yu Qiao. Lstd: A low-shot transfer detector for object detection. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • Tao Wang, Xiaopeng Zhang, Li Yuan, and Jiashi Feng. Few-shot adaptive faster r-cnn. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training region-based object detectors with online hard example mining. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2015.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1904–1916, 2015.
    Google ScholarLocate open access versionFindings
  • Andrew Y Ng, Michael I Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2002.
    Google ScholarLocate open access versionFindings
  • Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes challenge 2007 (voc2007) results. 2007.
    Google ScholarFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. Deformable convolutional networks. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S Davis. Soft-nms–improving object detection with one line of code. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
    Google ScholarLocate open access versionFindings
Author
Yukuan Yang
Yukuan Yang
Fangyun Wei
Fangyun Wei
Miaojing Shi
Miaojing Shi
Your rating :
0

 

Tags
Comments
小科