AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We introduce LVIS: a new dataset for Large Vocabulary Instance Segmentation

LVIS - A Dataset for Large Vocabulary Instance Segmentation.

CVPR, pp.5356-5364, (2019)

Cited by: 117|Views288
EI
Full Text
Bibtex
Weibo

Abstract

Progress on object detection is enabled by datasets that focus the research community's attention on open challenges. This process led us from simple images to complex scenes and from bounding boxes to segmentation masks. In this work, we introduce LVIS (pronounced `el-vis'): a new dataset for Large Vocabulary Instance Segmentation. We ...More

Code:

Data:

0
Introduction
  • A central goal of computer vision is to endow algorithms with the ability to intelligently describe images.
  • Learning from few examples is a significant open problem in machine learning and computer vision, making this opportunity one of the most exciting from a scientific and practical perspective.
  • To open this area to empirical study, a suitable, high-quality dataset and benchmark are required
Highlights
  • A central goal of computer vision is to endow algorithms with the ability to intelligently describe images
  • Missing annotations often occur in ‘crowd’ cases in which there are a large number of instances and delineating them is difficult
  • Tab. 2 shows that both box average precision (AP) and mask AP are close between our annotations and the original ones from COCO for all models, which span a wide AP range
  • AP decreases somewhat (∼2 points) as we increase the number of negative images as the ratio of negative to positive examples grows with fixed |Pc| and increasing |Nc|
  • We observe that even with a small positive set size of 80, AP is similar to the baseline with low variance
  • We introduced LVIS, a new dataset designed to enable, for the first time, the rigorous study of instance segmentation algorithms that can recognize a large vocabulary of object categories (>1000) and must do so using methods that can cope with the open problem of low-shot learning
Results
  • Evaluation Challenges

    Datasets like PASCAL VOC and COCO use manually selected categories that are pairwise disjoint: when annotating a car, there’s never any question if the object is instead a potted plant or a sofa.
  • Tab. 2 shows that both box AP and mask AP are close between the annotations and the original ones from COCO for all models, which span a wide AP range
  • This result validates the annotations and evaluation protocol: even though LVIS uses a federated dataset design with sparse annotations, the quantitative outcome closely reproduces the ‘gold standard’ results from dense COCO annotations.
  • With smaller positive sets variance increases, but the AP gap from 1st to 3rd quartile remains below 2 points
  • These simulations together with COCO detectors tested on LVIS (Tab. 2) indicate that including smaller evaluation sets for each category is viable for evaluation
Conclusion
  • The authors introduced LVIS, a new dataset designed to enable, for the first time, the rigorous study of instance segmentation algorithms that can recognize a large vocabulary of object categories (>1000) and must do so using methods that can cope with the open problem of low-shot learning.
  • While LVIS emphasizes learning from few examples, the dataset is not small: it will span 164k images and label ∼2.2 million object instances.
  • Each object instance is segmented with a high-quality mask that surpasses the annotation quality of related datasets.
  • The authors plan to establish LVIS as a benchmark challenge that the authors hope will lead to exciting new object detection, segmentation, and low-shot learning algorithms
Summary
  • Introduction:

    A central goal of computer vision is to endow algorithms with the ability to intelligently describe images.
  • Learning from few examples is a significant open problem in machine learning and computer vision, making this opportunity one of the most exciting from a scientific and practical perspective.
  • To open this area to empirical study, a suitable, high-quality dataset and benchmark are required
  • Objectives:

    The authors' goal is to enable benchmarking of large vocabulary instance segmentation methods
  • Results:

    Evaluation Challenges

    Datasets like PASCAL VOC and COCO use manually selected categories that are pairwise disjoint: when annotating a car, there’s never any question if the object is instead a potted plant or a sofa.
  • Tab. 2 shows that both box AP and mask AP are close between the annotations and the original ones from COCO for all models, which span a wide AP range
  • This result validates the annotations and evaluation protocol: even though LVIS uses a federated dataset design with sparse annotations, the quantitative outcome closely reproduces the ‘gold standard’ results from dense COCO annotations.
  • With smaller positive sets variance increases, but the AP gap from 1st to 3rd quartile remains below 2 points
  • These simulations together with COCO detectors tested on LVIS (Tab. 2) indicate that including smaller evaluation sets for each category is viable for evaluation
  • Conclusion:

    The authors introduced LVIS, a new dataset designed to enable, for the first time, the rigorous study of instance segmentation algorithms that can recognize a large vocabulary of object categories (>1000) and must do so using methods that can cope with the open problem of low-shot learning.
  • While LVIS emphasizes learning from few examples, the dataset is not small: it will span 164k images and label ∼2.2 million object instances.
  • Each object instance is segmented with a high-quality mask that surpasses the annotation quality of related datasets.
  • The authors plan to establish LVIS as a benchmark challenge that the authors hope will lead to exciting new object detection, segmentation, and low-shot learning algorithms
Tables
  • Table1: Hairbrush (3). Beer Bottle (3). Annotation quality and complexity relative to experts
  • Table2: COCO-trained Mask R-CNN evaluated on LVIS annotations. Both annotations yield similar AP values
Download tables as Excel
Funding
  • To prevent frequent categories from dominating the dataset and to reduce the overall workload, we subsample frequent categories such that no positive set exceeds more than 1% of the images in the dataset
  • The results show that roughly 50% of matched instances have IoU greater than 90% and roughly 75% of the image-category pairs have a perfect F1 score
  • At 1k images, mask AP drops from 36.4% (full dataset) to 9.8% (1k subset)
Study subjects and analysis
plant and animal species: 5000
In contrast, our goal is to enable benchmarking of large vocabulary instance segmentation methods. iNaturalist [26] contains nearly 900k images annotated with bounding boxes for an astonishing 5000 plant and animal species. Similar to our goals, iNaturalist emphasizes

training samples: 100
The detection portion of the dataset includes 15M bounding boxes labeled with 600 object categories. The associated benchmark evaluates the 500 most frequent categories, all of which have over 100 training samples (>70% of them have over 1000 training samples). Thus, unlike our benchmark, low-shot learning is not integral to Open Images

datasets: 4
Our annotation pipeline comprises six stages. Stage 1: Object Spotting elicits annotators to mark a single instance of many different categories per image. This stage is iterative and causes annotators to discover a long tail of categories. Stage 2: Exhaustive Instance Marking extends the stage 1 annotations to cover all instances of each spotted category. Here we show additional instances of book. Stages 3 and 4: Instance Segmentation and Verification are repeated back and forth until ∼99% of all segmentations pass a quality check. Stage 5: Exhaustive Annotations Verification checks that all instances are in fact segmented and flags categories that are missing one or more instances. Stage 6: Negative Labels are assigned by verifying that a subset of categories do not appear in the image. Distribution of object centers in normalized image coordinates for four datasets. Objects in LVIS, COCO, and ADE20K are well distributed (objects in LVIS are slightly less centered than in COCO and slightly more centered than in ADE20K). On the other hand, Open Images exhibits a strong center bias. Dataset statistics. Best viewed digitally

Reference
  • Fred Attneave and Malcolm D Arnoult. The quantitative study of shape and pattern perception. Psychological bulletin, 1956. 7
    Google ScholarLocate open access versionFindings
  • Marc Brysbaert, Amy Beth Warriner, and Victor Kuperman. Concreteness ratings for 40 thousand generally known english word lemmas. Behavior research methods, 2014. 6
    Google ScholarLocate open access versionFindings
  • Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Franke, Stefan Roth, and Bernt Schiele. The Cityscapes dataset for semantic urban scene understanding. In CVPR, 2016. 2
    Google ScholarLocate open access versionFindings
  • Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Perona. Pedestrian detection: An evaluation of the state of the art. TPAMI, 2012. 2
    Google ScholarLocate open access versionFindings
  • Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV, 2010. 2
    Google ScholarLocate open access versionFindings
  • Li Fei-Fei, Rob Fergus, and Pietro Perona. One-shot learning of object categories. TPAMI, 2002
    Google ScholarLocate open access versionFindings
  • Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollar, and Kaiming He. Detectron. https://github.
    Findings
  • com/facebookresearch/detectron, 2018
    Google ScholarFindings
  • [8] Bharath Hariharan and Ross Girshick. Low-shot visual recognition by shrinking and hallucinating features. In
    Google ScholarLocate open access versionFindings
  • [9] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask R-CNN. In ICCV, 2017. 8
    Google ScholarLocate open access versionFindings
  • [10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. 2
    Google ScholarLocate open access versionFindings
  • [11] Sergey Ioffe and Christian Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. In ICML, 2015. 2
    Google ScholarLocate open access versionFindings
  • [12] Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. Panoptic segmentation. In CVPR, 2019. 7
    Google ScholarLocate open access versionFindings
  • [13] Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012. 2
    Google ScholarLocate open access versionFindings
  • [14] Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Tom Duerig, et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv preprint arXiv:1811.00982, 2018. 3
    Findings
  • [15] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989. 2
    Google ScholarFindings
  • [16] Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. The MNIST database of handwritten digits. http:// yann.lecun.com/exdb/mnist/, 1998. 2
    Findings
  • [17] Marc Liberman. Reproducible research and the common task method. Simmons Foundation Lecture https://www.simonsfoundation.org/
    Locate open access versionFindings
  • lecture/reproducible-research-and-thecommon-task-method/, 2015. 2
    Google ScholarFindings
  • [18] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • [19] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. COCO detection evaluation. http://cocodataset.org/#detection-eval, Accessed Oct 30, 2018.2, 3
    Findings
  • [20] David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001. 2, 7
    Google ScholarLocate open access versionFindings
  • [21] George Miller. WordNet: An electronic lexical database. MIT press, 1998. 4
    Google ScholarFindings
  • [22] Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. The mapillary vistas dataset for semantic understanding of street scenes. In ICCV, 2017. 2
    Google ScholarLocate open access versionFindings
  • [23] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015. 2
    Google ScholarLocate open access versionFindings
  • [24] Bryan C Russell, Antonio Torralba, Kevin P Murphy, and William T Freeman. Labelme: a database and web-based tool for image annotation. IJCV, 2008. 1
    Google ScholarLocate open access versionFindings
  • [25] Merrielle Spain and Pietro Perona. Measuring and predicting importance of objects in our visual world. Technical Report CNS-TR-2007-002, California Institute of Technology, 2007. 1
    Google ScholarFindings
  • [26] Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The iNaturalist species classification and detection dataset. In CVPR, 2018. 2
    Google ScholarLocate open access versionFindings
  • [27] Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010. 1
    Google ScholarLocate open access versionFindings
  • [28] Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Semantic understanding of scenes through the ADE20K dataset. IJCV, 2019. 2
    Google ScholarLocate open access versionFindings
  • [29] George Kingsley Zipf. The psycho-biology of language: An introduction to dynamic philology. Routledge, 2013. 1
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
小科