AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
To effectively learn qualified and distributed bounding boxes for dense object detectors, we propose Generalized Focal Loss that generalizes the original Focal Loss from {1, 0} discrete formulation to the continuous version

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

NIPS 2020, (2020)

Cited by: 5|Views254
EI
Full Text
Bibtex
Weibo

Abstract

One-stage detector basically formulates object detection as dense classification and localization. The classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. A recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality o...More

Code:

Data:

0
Introduction
  • Dense detectors have gradually led the trend of object detection, whilst the attention on the representation of bounding boxes and their localization quality estimation leads to the encouraging advancement.
  • Bounding box representation is modeled as a simple Dirac delta distribution [10, 18, 32, 26, 31], which is widely used over past years.
  • Train supervision category label classification score.
  • Train & Test supervision category/IoU joint label classification & IoU joint score NMS score bbox regression (a)Existing Work (b)Ours joint rigid flexible predicting an Dirac delta distribution additional
Highlights
  • Dense detectors have gradually led the trend of object detection, whilst the attention on the representation of bounding boxes and their localization quality estimation leads to the encouraging advancement
  • We demonstrate three advantages of Generalized Focal Loss (GFL): (1) It bridges the gap between training and test when one-stage detectors are facilitated with additional quality estimation, leading to a simpler, joint and effective representation of both classification and localization quality; (2) It well models the flexible underlying distribution for bounding boxes, which provides more informative and accurate box locations; (3) The performance of one-stage detectors can be consistently boosted without introducing additional overhead
  • We present the details for the improved representations of localization quality estimation and bounding boxes, which are successfully optimized via the proposed Quality Focal Loss (QFL) and Distribution Focal Loss (DFL), respectively
  • To effectively learn qualified and distributed bounding boxes for dense object detectors, we propose Generalized Focal Loss (GFL) that generalizes the original Focal Loss from {1, 0} discrete formulation to the continuous version
  • Extensive experiments validate the effectiveness of GFL
Methods
  • The authors first review the original Focal Loss [18] (FL) for learning dense classification scores of one-stage detectors.
  • GFL: Quality Focal Loss (QFL).
  • Formulations of QFL and DFL into a unified perspective termed Generalized Focal Loss (GFL), as a flexible extension of FL, to facilitate further promotion and general understanding in the future.
  • Generalized Focal Loss (GFL).
  • Note that QFL and DFL can be unified into a general form, which is called the Generalized Focal Loss (GFL) in the paper.
  • GFL(pyl , pyr ) reaches its global minimum with p∗yl means that the estimation yperfectly matches the continuous yr −y yr −yl
Conclusion
  • To effectively learn qualified and distributed bounding boxes for dense object detectors, the authors propose Generalized Focal Loss (GFL) that generalizes the original Focal Loss from {1, 0} discrete formulation to the continuous version.
  • GFL can be specialized into Quality Focal loss (QFL) and Distribution Focal Loss (DFL), where QFL encourages to learn a better joint representation of classification and localization quality, and DFL provides more informative and precise bounding box estimations by modeling their locations as General distributions.
  • Extensive experiments validate the effectiveness of GFL.
  • The authors hope GFL can serve as a simple yet effective baseline for the community
Summary
  • Introduction:

    Dense detectors have gradually led the trend of object detection, whilst the attention on the representation of bounding boxes and their localization quality estimation leads to the encouraging advancement.
  • Bounding box representation is modeled as a simple Dirac delta distribution [10, 18, 32, 26, 31], which is widely used over past years.
  • Train supervision category label classification score.
  • Train & Test supervision category/IoU joint label classification & IoU joint score NMS score bbox regression (a)Existing Work (b)Ours joint rigid flexible predicting an Dirac delta distribution additional
  • Methods:

    The authors first review the original Focal Loss [18] (FL) for learning dense classification scores of one-stage detectors.
  • GFL: Quality Focal Loss (QFL).
  • Formulations of QFL and DFL into a unified perspective termed Generalized Focal Loss (GFL), as a flexible extension of FL, to facilitate further promotion and general understanding in the future.
  • Generalized Focal Loss (GFL).
  • Note that QFL and DFL can be unified into a general form, which is called the Generalized Focal Loss (GFL) in the paper.
  • GFL(pyl , pyr ) reaches its global minimum with p∗yl means that the estimation yperfectly matches the continuous yr −y yr −yl
  • Conclusion:

    To effectively learn qualified and distributed bounding boxes for dense object detectors, the authors propose Generalized Focal Loss (GFL) that generalizes the original Focal Loss from {1, 0} discrete formulation to the continuous version.
  • GFL can be specialized into Quality Focal loss (QFL) and Distribution Focal Loss (DFL), where QFL encourages to learn a better joint representation of classification and localization quality, and DFL provides more informative and precise bounding box estimations by modeling their locations as General distributions.
  • Extensive experiments validate the effectiveness of GFL.
  • The authors hope GFL can serve as a simple yet effective baseline for the community
Tables
  • Table1: Study on QFL (ResNet-50 backbone). All experiments are reproduced in mmdetection [<a class="ref-link" id="c3" href="#r3">3</a>] and validated on COCO minival
  • Table2: Study on DFL (ResNet-50 backbone). All experiments are reproduced in mmdetection [<a class="ref-link" id="c3" href="#r3">3</a>] and validated on COCO minival
  • Table3: The effect of QFL and DFL on ATSS: The for bounding box regression. We find that the General diseffects of QFL and DFL are orthogonal, whilst utilizing both can boost 1% AP over the strong ATSS baseline, tribution achieves superior or at least comparable results, without introducing additional overhead practically
  • Table4: Comparisons between state-of-the-art detectors (single-model and single-scale results) on COCO test-dev. “MStrain” denotes multiscale training. FPS values with ∗ are from [<a class="ref-link" id="c33" href="#r33">33</a>], while others are measured on the same machine with a single GeForce RTX 2080Ti GPU under the same mmdetection [<a class="ref-link" id="c3" href="#r3">3</a>] framework, using a batch size of 1 whenever possible. “n/a” means that both trained models and timing results from original papers are not available. R: ResNet. X: ResNeXt. HG: Hourglass. DCN: Deformable Convolutional Network
  • Table5: Comparisons between three distributions. “edge” level denotes optimization over four respective directions, whilst “box” level means IoU-based Losses [<a class="ref-link" id="c24" href="#r24">24</a>] that consider the bounding box as a whole
Download tables as Excel
Related work
  • Representation of localization quality. Existing practices like Fitness NMS [27], IoU-Net [12], MS R-CNN [11], FCOS [26] and IoU-aware [29] utilize a separate branch to perform localization quality estimation in a form of IoU or centerness score. As mentioned in Sec. 1, this separate formulation causes the inconsistency between training and test as well as unreliable quality predictions. Instead of introducing an additional branch, PISA [2] and IoU-balance [28] assign different weights in the classification loss based on their localization qualities, aiming at enhancing the correlation between the classification score and localization accuracy. However, the weight strategy is of implicit and limited benefits since it does not change the optimum of the loss objectives for classification.

    Representation of bounding boxes. Dirac delta distribution [7, 23, 8, 1, 18, 26, 13, 31] governs the representation of bounding boxes over past years. Recently, Gaussian assumption [10, 4] is adopted to learn the uncertainty by introducing a predicted variance. Unfortunately, existing representations are either too rigid or too simplified, which can not reflect the complex underlying distribution in real data. In this paper, we further relax the assumption and directly learn the more arbitrary, flexible General distribution of bounding boxes, whilst being more informative and accurate.
Reference
  • Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: Delving into high quality object detection. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Yuhang Cao, Kai Chen, Chen Change Loy, and Dahua Lin. Prime sample attention in object detection. arXiv preprint arXiv:1904.04821, 2019.
    Findings
  • Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
    Findings
  • Jiwoong Choi, Dayoung Chun, Hyun Kim, and Hyuk-Jae Lee. Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Zhiwei Dong, Guoxuan Li, Yue Liao, Fei Wang, Pengju Ren, and Chen Qian. Centripetalnet: Pursuing high-quality keypoint pairs for object detection. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian. Centernet: Keypoint triplets for object detection. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Ross Girshick. Fast r-cnn. In ICCV, 2015.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, and Xiangyu Zhang. Bounding box regression with uncertainty for accurate object detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, and Xinggang Wang. Mask scoring r-cnn. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, and Yuning Jiang. Acquisition of localization confidence for accurate object detection. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, and Jianbo Shi. Foveabox: Beyond anchor-based object detector. arXiv preprint arXiv:1904.03797, 2019.
    Findings
  • Hei Law and Jia Deng. Cornernet: Detecting objects as paired keypoints. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Hengduo Li, Zuxuan Wu, Chen Zhu, Caiming Xiong, Richard Socher, and Larry S Davis. Learning from noisy anchors for one-stage object detection. arXiv preprint arXiv:1912.05086, 2019.
    Findings
  • Yanghao Li, Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. Scale-aware trident networks for object detection. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In ECCV, 2016.
    Google ScholarLocate open access versionFindings
  • Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, and Junjie Yan. Grid r-cnn. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, and Dahua Lin. Libra r-cnn: Towards balanced learning for object detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPs, 2015.
    Google ScholarLocate open access versionFindings
  • Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Guanglu Song, Yu Liu, and Xiaogang Wang. Revisiting the sibling head in object detector. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Fcos: Fully convolutional one-stage object detection. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Lachlan Tychsen-Smith and Lars Petersson. Improving object localization with fitness nms and bounded iou loss. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Shengkai Wu and Xiaoping Li. Iou-balanced loss functions for single-stage object detection. arXiv preprint arXiv:1908.05641, 2019.
    Findings
  • Shengkai Wu, Xiaoping Li, and Xinggang Wang. Iou-aware single-stage object detector for accurate localization. Image and Vision Computing, 2020.
    Google ScholarLocate open access versionFindings
  • Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. Reppoints: Point set representation for object detection. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z Li. Bridging the gap between anchorbased and anchor-free detection via adaptive training sample selection. In CVPR, 2020.
    Google ScholarLocate open access versionFindings
  • Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, and Qixiang Ye. Freeanchor: Learning to match anchors for visual object detection. In NeurIPs, 2019.
    Google ScholarLocate open access versionFindings
  • Chenchen Zhu, Fangyi Chen, Zhiqiang Shen, and Marios Savvides. Soft anchor-point object detection. In
    Google ScholarLocate open access versionFindings
  • Chenchen Zhu, Yihui He, and Marios Savvides. Feature selective anchor-free module for single-shot object detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Li Zhu, Zihao Xie, Liman Liu, Bo Tao, and Wenbing Tao. Iou-uniform r-cnn: Breaking through the limitations of rpn. arXiv preprint arXiv:1912.05190, 2019.
    Findings
  • Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. Deformable convnets v2: More deformable, better results. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
小科