Learning to Segment the Tail
CVPR, pp. 14042-14051, 2020.
EI
Weibo:
Abstract:
Real-world visual recognition requires handling the extreme sample imbalance in large-scale long-tailed data. We propose a "divide\&conquer" strategy for the challenging LVIS task: divide the whole data into balanced parts and then apply incremental learning to conquer each one. This derives a novel learning paradigm: \textbf{class-incr...More
Code:
Data:
Introduction
- The long-tail distribution inherently exists in the visual world, where a few head classes occupy most of the instances [48, 1, 37, 44].
- This is inevitable when the authors are interested in modeling large-scale datasets, because the class observational probability in nature follows Zipf’s law [31].
- It is prohibitively expensive to counter the nature and collect a balanced sample-rich large-scale dataset, catering for training a robust visual recognition system using the prevailing models [13, 9, 34, 4].
Highlights
- The long-tail distribution inherently exists in our visual world, where a few head classes occupy most of the instances [48, 1, 37, 44]
- 15: end function value by focusing on the severe class imbalance and few-shot learning in the field of instance segmentation. We develop a novel learning paradigm for LVIS: classincremental few-shot learning. The proposed Learning to Segment the Tail (LST) for the above paradigm outperforms baseline methods, especially over the tail classes, where the model can adapt to unseen classes instantly without training
- Our method evaluated at the last phase, i.e., the whole dataset, outperforms the baselines in the tail classes (AP(0,10) and AP[10,100)) by a large margin
- We addressed the problem of large-scale long-tailed instance segmentation by formulating a novel paradigm: class-incremental few-shot learning, where any large dataset can be divided into groups and incrementally learned from the head to the tail
- We develop the Learning to Segment the Tail (LST) method, equipped with a novel instancelevel balanced replay technique and a meta-weight generator for few-shot classes adaptation
- Experimental results on the LVIS dataset [10] demonstrated that Learning to Segment the Tail could gain a significant improvement for the tail classes and achieve an overall boost for the whole 1,230 classes
Methods
- The authors conducted experiments on LVIS [10] using the standard metrics for instance segmentation.
- AP was calculated across IoU threshold from 0.5 to 0.95 over all categories.
- AP50 means using an IoU threshold 0.5.
- To better display the results from the head to the tail, AP(0,1], AP(0,5), AP(0,10), AP[10,100), AP[100,1000), AP[1000,−) were evaluated for the sets of categories which containing only 1, <5, <10, 10 ∼ 100, 100 ∼ 1,000 and ≥ 1,000 training object instances.
Results
- As shown in Table 1, the method evaluated at the last phase, i.e., the whole dataset, outperforms the baselines in the tail classes (AP(0,10) and AP[10,100)) by a large margin.
- The overall AP for both object detection and instance segmentation improves.
- As shown in Figure 5, the authors randomly sampled 60 classes from the tail classes, whose number of instances in the training set is smaller than 100, and reported the result with and without using the LST which is class-incremental.
- The authors observe that the approach obtains remarkable improvement in most tail categories.
Conclusion
- The authors addressed the problem of large-scale long-tailed instance segmentation by formulating a novel paradigm: class-incremental few-shot learning, where any large dataset can be divided into groups and incrementally learned from the head to the tail
- This paradigm introduces two new challenges over time: 1) for countering the catastrophic forgetting, the old classes are more and more imbalanced, 2) the new classes are more and more fewshot.
- LST offers a novel and practical solution for learning from large-scale long-tailed data: the authors can use only one downside — headclass forgetting, to trade off the two challenges — the large vocabulary and few-shot learning
Summary
Introduction:
The long-tail distribution inherently exists in the visual world, where a few head classes occupy most of the instances [48, 1, 37, 44].- This is inevitable when the authors are interested in modeling large-scale datasets, because the class observational probability in nature follows Zipf’s law [31].
- It is prohibitively expensive to counter the nature and collect a balanced sample-rich large-scale dataset, catering for training a robust visual recognition system using the prevailing models [13, 9, 34, 4].
Methods:
The authors conducted experiments on LVIS [10] using the standard metrics for instance segmentation.- AP was calculated across IoU threshold from 0.5 to 0.95 over all categories.
- AP50 means using an IoU threshold 0.5.
- To better display the results from the head to the tail, AP(0,1], AP(0,5), AP(0,10), AP[10,100), AP[100,1000), AP[1000,−) were evaluated for the sets of categories which containing only 1, <5, <10, 10 ∼ 100, 100 ∼ 1,000 and ≥ 1,000 training object instances.
Results:
As shown in Table 1, the method evaluated at the last phase, i.e., the whole dataset, outperforms the baselines in the tail classes (AP(0,10) and AP[10,100)) by a large margin.- The overall AP for both object detection and instance segmentation improves.
- As shown in Figure 5, the authors randomly sampled 60 classes from the tail classes, whose number of instances in the training set is smaller than 100, and reported the result with and without using the LST which is class-incremental.
- The authors observe that the approach obtains remarkable improvement in most tail categories.
Conclusion:
The authors addressed the problem of large-scale long-tailed instance segmentation by formulating a novel paradigm: class-incremental few-shot learning, where any large dataset can be divided into groups and incrementally learned from the head to the tail- This paradigm introduces two new challenges over time: 1) for countering the catastrophic forgetting, the old classes are more and more imbalanced, 2) the new classes are more and more fewshot.
- LST offers a novel and practical solution for learning from large-scale long-tailed data: the authors can use only one downside — headclass forgetting, to trade off the two challenges — the large vocabulary and few-shot learning
Tables
- Table1: Results of our LST and the comparison with other methods on LVIS val set. All experiments are performed based on ResNet-50FPN Mask R-CNN
- Table2: Results of our LST and baseline implemented on ResNeXt-101-32x8d-FPN Mask R-CNN
- Table3: Ablation study for different size of base classes b and the number of incremental phases
Related work
- Instance segmentation. Our instance segmentation backbone is based on the popular region-based frameworks [22, 13, 5, 25], in particular, Mask R-CNN [13] and its semisupervised extension MaskX R-CNN [17], which can transfer mask predictor from merely box annotation. However, they cannot scale up for the large-scale long-tailed dataset such as LVIS [10], which is the focus of our work. Imbalanced classification. Re-sampling and re-weighting are the two major efforts to tackle the class imbalance. The former aims to re-balance the training samples across classes [16, 3, 11, 6]; while the latter focuses on assigning different weights to adjust the loss function [18, 40, 47, 7]. Some works on generalized few-shot learning [46, 21] also deal with an extremely imbalanced dataset, extending the test label space of few-shot learning to both base and novel rare classes. We propose a novel re-sampling strategy. Different from previous works that perform on image-level resampling, we address the imbalance of dataset on instancelevel.
Funding
- This work was supported by Alibaba-NTU JRI, and partly supported by Major Scientific Research Project of Zhejiang Lab (No 2019DB0ZX01)
Reference
- Bryan C Russell, Antonio Torralba, Kevin P Murphy, and William T Freeman. Labelme: a database and web-based tool for image annotation. In IJCV, 2008. 1
- Francisco M. Castro, Manuel J. Marin-Jimenez, Nicolas Guil, Cordelia Schmid, and Karteek Alahari. End-to-End Incremental Learning. In ECCV, 2018. 3
- Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. SMOTE: Synthetic Minority Oversampling Technique. In Journal of artificial intelligence research, 2002. 1, 2
- Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. Hybrid Task Cascade for Instance Segmentation. In CVPR, 2019. 1
- Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, and Hartwig Adam. MaskLab: Instance Segmentation by Refining Object Detection With Semantic and Direction Features. In CVPR, 2018. 2
- Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-Balanced Loss Based on Effective Number of Samples. In CVPR, 2019. 2
- Qi Dong, Shaogang Gong, and Xiatian Zhu. Class rectification hard mining for imbalanced deep learning. In ICCV, 2012
- Spyros Gidaris and Nikos Komodakis. Dynamic Few-Shot Visual Learning Without Forgetting. In CVPR, 2012, 3, 4, 5
- Ross Girshick. Fast R-CNN. In ICCV, 2015. 1, 4
- Agrim Gupta, Piotr Dollar, and Ross Girshick. LVIS: A Dataset for Large Vocabulary Instance Segmentation. In CVPR, 2019. 1, 2, 3, 5, 6, 7, 8
- Haibo He, Yang Bai, E. A. Garcia, and Shutao Li. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IEEE International Joint Conference on Neural Networks, 2008. 1, 2
- Haibo He and Edwardo A Garcia. Learning from imbalanced data. In IEEE Transactions on Knowledge & Data Engineering, 2008. 1
- Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask R-CNN. In ICCV, 2017. 1, 2, 3, 4, 6, 7
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 6
- Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In NeurIPS, 2014. 3
- Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a Unified Classifier Incrementally via Rebalancing. In CVPR, 2019. 2, 3, 4
- Ronghang Hu, Piotr Dollr, Kaiming He, Trevor Darrell, and Ross Girshick. Learning to Segment Every Thing. In CVPR, 2018. 2, 3, 4, 6
- Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning Deep Representation for Imbalanced Classification. In CVPR, 2016. 2
- Buyu Li Quanquan Li Wanli Ouyang Changqing Yin Junjie Yan Jingru Tan, Changbao Wang. Equalization loss for long-tailed object recognition. ArXiv:2003.05176, 2020. 7
- Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. Few-Shot Object Detection via Feature Reweighting. In ICCV, 2019. 2, 3
- Aoxue Li, Tiange Luo, Tao Xiang, Weiran Huang, and Liwei Wang. Few-Shot Learning With Global Class Representations. In ICCV, 2019. 2
- Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. Fully Convolutional Instance-Aware Semantic Segmentation. In CVPR, 2017. 2
- Zhizhong Li and Derek Hoiem. Learning Without Forgetting. In ECCV, 2016. 3, 4
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context. In ECCV, 2014. 6
- Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path Aggregation Network for Instance Segmentation. In CVPR, 2018. 2
- Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, and Qianru Sun. Mnemonics training: Multi-class incremental learning without forgetting. In CVPR, June 2020. 3
- Laurens van der Maaten and Geoffrey Hinton. Visualizing Data using t-SNE. In JMLR, 2008. 7
- Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation - Advances in Research and Theory, 24:109–165, 1989. 1, 2
- Ethan Fetaya Richard S. Zemel Mengye Ren, Renjie Liao. Incremental few-shot learning with attention attractor networks. In NeurIPS, 2019. 2, 3
- Kemal Oksuz, Baris Can Cam, Sinan Kalkan, and Emre Akbas. Imbalance Problems in Object Detection: A Review. arXiv preprint arxiv:1909.00169, 2019. 7
- David M W Powers. Applications and explanations of zipf’s law. Association for Computational Linguistics, page 151160, 1998. 1
- Hang Qi, Matthew Brown, and David G. Lowe. Low-Shot Learning With Imprinted Weights. In CVPR, 2018. 3
- Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. iCaRL: Incremental Classifier and Representation Learning. In CVPR, 2017. 2, 3, 4, 5
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015. 1
- Li Shen, Zhouchen Lin, and Qingming Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In ECCV, 2016. 5, 6, 7
- Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari. Incremental Learning of Object Detectors Without Catastrophic Forgetting. In ICCV, 2017. 2, 3
- Merrielle Spain and Pietro Perona. Measuring and predicting importance of objects in our visual world. In Technical Report CNS- TR-2007-002, 2007. 1
- Gan Sun, Yang Cong, and Xiaowei Xu. Active Lifelong Learning With ”Watchdog”. In AAAI, 2018. 3
- Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. Meta-transfer learning for few-shot learning. In CVPR, 2019. 2
- Kai Ming Ting. A comparative study of cost-sensitive boosting algorithms. In ICML, 2000. 2
- Oriol Vinyals, Charles Blundell, Timothy Lillicrap, koray kavukcuoglu, and Daan Wierstra. Matching networks for one shot learning. In NeurIPS, 2016. 2, 6
- Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. Learning to model the tail. In NeurIPS, 2017. 2, 3
- Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. Metalearning to detect rare objects. In ICCV, 2019. 2, 3
- Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010. 1
- Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In CVPR, July 2017. 7
- Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, and Fei Sha. Learning Classifier Synthesis for Generalized Few-Shot Learning. arXiv preprint arxiv:1906.02944, 2019. 2, 3, 5
- Zhi-Hua Zhou and Xu-Ying Liu. On Multi-Class CostSensitive Learning. In AAAI, 2006. 2
- George Kingsley Zipf. The psycho-biology of language: An introduction to dynamic philology. In Routledge, 2013. 1
Full Text
Tags
Comments