Learning to Segment the Tail

Hu Xinting
Hu Xinting
Jiang Yi
Jiang Yi
Chen Jingyuan
Chen Jingyuan

CVPR, pp. 14042-14051, 2020.

Cited by: 0|Bibtex|Views122|DOI:https://doi.org/10.1109/CVPR42600.2020.01406
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We addressed the problem of large-scale long-tailed instance segmentation by formulating a novel paradigm: class-incremental few-shot learning, where any large dataset can be divided into groups and incrementally learned from the head to the tail

Abstract:

Real-world visual recognition requires handling the extreme sample imbalance in large-scale long-tailed data. We propose a "divide\&conquer" strategy for the challenging LVIS task: divide the whole data into balanced parts and then apply incremental learning to conquer each one. This derives a novel learning paradigm: \textbf{class-incr...More

Code:

Data:

0
Introduction
  • The long-tail distribution inherently exists in the visual world, where a few head classes occupy most of the instances [48, 1, 37, 44].
  • This is inevitable when the authors are interested in modeling large-scale datasets, because the class observational probability in nature follows Zipf’s law [31].
  • It is prohibitively expensive to counter the nature and collect a balanced sample-rich large-scale dataset, catering for training a robust visual recognition system using the prevailing models [13, 9, 34, 4].
Highlights
  • The long-tail distribution inherently exists in our visual world, where a few head classes occupy most of the instances [48, 1, 37, 44]
  • 15: end function value by focusing on the severe class imbalance and few-shot learning in the field of instance segmentation. We develop a novel learning paradigm for LVIS: classincremental few-shot learning. The proposed Learning to Segment the Tail (LST) for the above paradigm outperforms baseline methods, especially over the tail classes, where the model can adapt to unseen classes instantly without training
  • Our method evaluated at the last phase, i.e., the whole dataset, outperforms the baselines in the tail classes (AP(0,10) and AP[10,100)) by a large margin
  • We addressed the problem of large-scale long-tailed instance segmentation by formulating a novel paradigm: class-incremental few-shot learning, where any large dataset can be divided into groups and incrementally learned from the head to the tail
  • We develop the Learning to Segment the Tail (LST) method, equipped with a novel instancelevel balanced replay technique and a meta-weight generator for few-shot classes adaptation
  • Experimental results on the LVIS dataset [10] demonstrated that Learning to Segment the Tail could gain a significant improvement for the tail classes and achieve an overall boost for the whole 1,230 classes
Methods
  • The authors conducted experiments on LVIS [10] using the standard metrics for instance segmentation.
  • AP was calculated across IoU threshold from 0.5 to 0.95 over all categories.
  • AP50 means using an IoU threshold 0.5.
  • To better display the results from the head to the tail, AP(0,1], AP(0,5), AP(0,10), AP[10,100), AP[100,1000), AP[1000,−) were evaluated for the sets of categories which containing only 1, <5, <10, 10 ∼ 100, 100 ∼ 1,000 and ≥ 1,000 training object instances.
Results
  • As shown in Table 1, the method evaluated at the last phase, i.e., the whole dataset, outperforms the baselines in the tail classes (AP(0,10) and AP[10,100)) by a large margin.
  • The overall AP for both object detection and instance segmentation improves.
  • As shown in Figure 5, the authors randomly sampled 60 classes from the tail classes, whose number of instances in the training set is smaller than 100, and reported the result with and without using the LST which is class-incremental.
  • The authors observe that the approach obtains remarkable improvement in most tail categories.
Conclusion
  • The authors addressed the problem of large-scale long-tailed instance segmentation by formulating a novel paradigm: class-incremental few-shot learning, where any large dataset can be divided into groups and incrementally learned from the head to the tail
  • This paradigm introduces two new challenges over time: 1) for countering the catastrophic forgetting, the old classes are more and more imbalanced, 2) the new classes are more and more fewshot.
  • LST offers a novel and practical solution for learning from large-scale long-tailed data: the authors can use only one downside — headclass forgetting, to trade off the two challenges — the large vocabulary and few-shot learning
Summary
  • Introduction:

    The long-tail distribution inherently exists in the visual world, where a few head classes occupy most of the instances [48, 1, 37, 44].
  • This is inevitable when the authors are interested in modeling large-scale datasets, because the class observational probability in nature follows Zipf’s law [31].
  • It is prohibitively expensive to counter the nature and collect a balanced sample-rich large-scale dataset, catering for training a robust visual recognition system using the prevailing models [13, 9, 34, 4].
  • Methods:

    The authors conducted experiments on LVIS [10] using the standard metrics for instance segmentation.
  • AP was calculated across IoU threshold from 0.5 to 0.95 over all categories.
  • AP50 means using an IoU threshold 0.5.
  • To better display the results from the head to the tail, AP(0,1], AP(0,5), AP(0,10), AP[10,100), AP[100,1000), AP[1000,−) were evaluated for the sets of categories which containing only 1, <5, <10, 10 ∼ 100, 100 ∼ 1,000 and ≥ 1,000 training object instances.
  • Results:

    As shown in Table 1, the method evaluated at the last phase, i.e., the whole dataset, outperforms the baselines in the tail classes (AP(0,10) and AP[10,100)) by a large margin.
  • The overall AP for both object detection and instance segmentation improves.
  • As shown in Figure 5, the authors randomly sampled 60 classes from the tail classes, whose number of instances in the training set is smaller than 100, and reported the result with and without using the LST which is class-incremental.
  • The authors observe that the approach obtains remarkable improvement in most tail categories.
  • Conclusion:

    The authors addressed the problem of large-scale long-tailed instance segmentation by formulating a novel paradigm: class-incremental few-shot learning, where any large dataset can be divided into groups and incrementally learned from the head to the tail
  • This paradigm introduces two new challenges over time: 1) for countering the catastrophic forgetting, the old classes are more and more imbalanced, 2) the new classes are more and more fewshot.
  • LST offers a novel and practical solution for learning from large-scale long-tailed data: the authors can use only one downside — headclass forgetting, to trade off the two challenges — the large vocabulary and few-shot learning
Tables
  • Table1: Results of our LST and the comparison with other methods on LVIS val set. All experiments are performed based on ResNet-50FPN Mask R-CNN
  • Table2: Results of our LST and baseline implemented on ResNeXt-101-32x8d-FPN Mask R-CNN
  • Table3: Ablation study for different size of base classes b and the number of incremental phases
Download tables as Excel
Related work
  • Instance segmentation. Our instance segmentation backbone is based on the popular region-based frameworks [22, 13, 5, 25], in particular, Mask R-CNN [13] and its semisupervised extension MaskX R-CNN [17], which can transfer mask predictor from merely box annotation. However, they cannot scale up for the large-scale long-tailed dataset such as LVIS [10], which is the focus of our work. Imbalanced classification. Re-sampling and re-weighting are the two major efforts to tackle the class imbalance. The former aims to re-balance the training samples across classes [16, 3, 11, 6]; while the latter focuses on assigning different weights to adjust the loss function [18, 40, 47, 7]. Some works on generalized few-shot learning [46, 21] also deal with an extremely imbalanced dataset, extending the test label space of few-shot learning to both base and novel rare classes. We propose a novel re-sampling strategy. Different from previous works that perform on image-level resampling, we address the imbalance of dataset on instancelevel.
Funding
  • This work was supported by Alibaba-NTU JRI, and partly supported by Major Scientific Research Project of Zhejiang Lab (No 2019DB0ZX01)
Reference
  • Bryan C Russell, Antonio Torralba, Kevin P Murphy, and William T Freeman. Labelme: a database and web-based tool for image annotation. In IJCV, 2008. 1
    Google ScholarFindings
  • Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Cordelia Schmid, and Karteek Alahari. End-to-End Incremental Learning. In ECCV, 2018. 3
    Google ScholarLocate open access versionFindings
  • Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. SMOTE: Synthetic Minority Oversampling Technique. In Journal of artificial intelligence research, 2002. 1, 2
    Google ScholarLocate open access versionFindings
  • Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. Hybrid Task Cascade for Instance Segmentation. In CVPR, 2019. 1
    Google ScholarLocate open access versionFindings
  • Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, and Hartwig Adam. MaskLab: Instance Segmentation by Refining Object Detection With Semantic and Direction Features. In CVPR, 2018. 2
    Google ScholarLocate open access versionFindings
  • Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-Balanced Loss Based on Effective Number of Samples. In CVPR, 2019. 2
    Google ScholarLocate open access versionFindings
  • Qi Dong, Shaogang Gong, and Xiatian Zhu. Class rectification hard mining for imbalanced deep learning. In ICCV, 2012
    Google ScholarLocate open access versionFindings
  • Spyros Gidaris and Nikos Komodakis. Dynamic Few-Shot Visual Learning Without Forgetting. In CVPR, 2012, 3, 4, 5
    Google ScholarLocate open access versionFindings
  • Ross Girshick. Fast R-CNN. In ICCV, 2015. 1, 4
    Google ScholarFindings
  • Agrim Gupta, Piotr Dollar, and Ross Girshick. LVIS: A Dataset for Large Vocabulary Instance Segmentation. In CVPR, 2019. 1, 2, 3, 5, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • Haibo He, Yang Bai, E. A. Garcia, and Shutao Li. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IEEE International Joint Conference on Neural Networks, 2008. 1, 2
    Google ScholarLocate open access versionFindings
  • Haibo He and Edwardo A Garcia. Learning from imbalanced data. In IEEE Transactions on Knowledge & Data Engineering, 2008. 1
    Google ScholarLocate open access versionFindings
  • Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask R-CNN. In ICCV, 2017. 1, 2, 3, 4, 6, 7
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 6
    Google ScholarLocate open access versionFindings
  • Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In NeurIPS, 2014. 3
    Google ScholarLocate open access versionFindings
  • Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a Unified Classifier Incrementally via Rebalancing. In CVPR, 2019. 2, 3, 4
    Google ScholarLocate open access versionFindings
  • Ronghang Hu, Piotr Dollr, Kaiming He, Trevor Darrell, and Ross Girshick. Learning to Segment Every Thing. In CVPR, 2018. 2, 3, 4, 6
    Google ScholarLocate open access versionFindings
  • Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning Deep Representation for Imbalanced Classification. In CVPR, 2016. 2
    Google ScholarLocate open access versionFindings
  • Buyu Li Quanquan Li Wanli Ouyang Changqing Yin Junjie Yan Jingru Tan, Changbao Wang. Equalization loss for long-tailed object recognition. ArXiv:2003.05176, 2020. 7
    Findings
  • Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. Few-Shot Object Detection via Feature Reweighting. In ICCV, 2019. 2, 3
    Google ScholarLocate open access versionFindings
  • Aoxue Li, Tiange Luo, Tao Xiang, Weiran Huang, and Liwei Wang. Few-Shot Learning With Global Class Representations. In ICCV, 2019. 2
    Google ScholarLocate open access versionFindings
  • Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. Fully Convolutional Instance-Aware Semantic Segmentation. In CVPR, 2017. 2
    Google ScholarLocate open access versionFindings
  • Zhizhong Li and Derek Hoiem. Learning Without Forgetting. In ECCV, 2016. 3, 4
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context. In ECCV, 2014. 6
    Google ScholarLocate open access versionFindings
  • Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path Aggregation Network for Instance Segmentation. In CVPR, 2018. 2
    Google ScholarLocate open access versionFindings
  • Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, and Qianru Sun. Mnemonics training: Multi-class incremental learning without forgetting. In CVPR, June 2020. 3
    Google ScholarLocate open access versionFindings
  • Laurens van der Maaten and Geoffrey Hinton. Visualizing Data using t-SNE. In JMLR, 2008. 7
    Google ScholarFindings
  • Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation - Advances in Research and Theory, 24:109–165, 1989. 1, 2
    Google ScholarLocate open access versionFindings
  • Ethan Fetaya Richard S. Zemel Mengye Ren, Renjie Liao. Incremental few-shot learning with attention attractor networks. In NeurIPS, 2019. 2, 3
    Google ScholarLocate open access versionFindings
  • Kemal Oksuz, Baris Can Cam, Sinan Kalkan, and Emre Akbas. Imbalance Problems in Object Detection: A Review. arXiv preprint arxiv:1909.00169, 2019. 7
    Findings
  • David M W Powers. Applications and explanations of zipf’s law. Association for Computational Linguistics, page 151160, 1998. 1
    Google ScholarLocate open access versionFindings
  • Hang Qi, Matthew Brown, and David G. Lowe. Low-Shot Learning With Imprinted Weights. In CVPR, 2018. 3
    Google ScholarLocate open access versionFindings
  • Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. iCaRL: Incremental Classifier and Representation Learning. In CVPR, 2017. 2, 3, 4, 5
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015. 1
    Google ScholarLocate open access versionFindings
  • Li Shen, Zhouchen Lin, and Qingming Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In ECCV, 2016. 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari. Incremental Learning of Object Detectors Without Catastrophic Forgetting. In ICCV, 2017. 2, 3
    Google ScholarLocate open access versionFindings
  • Merrielle Spain and Pietro Perona. Measuring and predicting importance of objects in our visual world. In Technical Report CNS- TR-2007-002, 2007. 1
    Google ScholarLocate open access versionFindings
  • Gan Sun, Yang Cong, and Xiaowei Xu. Active Lifelong Learning With ”Watchdog”. In AAAI, 2018. 3
    Google ScholarLocate open access versionFindings
  • Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. Meta-transfer learning for few-shot learning. In CVPR, 2019. 2
    Google ScholarLocate open access versionFindings
  • Kai Ming Ting. A comparative study of cost-sensitive boosting algorithms. In ICML, 2000. 2
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Charles Blundell, Timothy Lillicrap, koray kavukcuoglu, and Daan Wierstra. Matching networks for one shot learning. In NeurIPS, 2016. 2, 6
    Google ScholarLocate open access versionFindings
  • Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. Learning to model the tail. In NeurIPS, 2017. 2, 3
    Google ScholarLocate open access versionFindings
  • Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. Metalearning to detect rare objects. In ICCV, 2019. 2, 3
    Google ScholarLocate open access versionFindings
  • Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010. 1
    Google ScholarLocate open access versionFindings
  • Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In CVPR, July 2017. 7
    Google ScholarLocate open access versionFindings
  • Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, and Fei Sha. Learning Classifier Synthesis for Generalized Few-Shot Learning. arXiv preprint arxiv:1906.02944, 2019. 2, 3, 5
    Findings
  • Zhi-Hua Zhou and Xu-Ying Liu. On Multi-Class CostSensitive Learning. In AAAI, 2006. 2
    Google ScholarLocate open access versionFindings
  • George Kingsley Zipf. The psycho-biology of language: An introduction to dynamic philology. In Routledge, 2013. 1
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments