Causal Intervention for Weakly-Supervised Semantic Segmentation

NIPS 2020, 2020.

Cited by: 3|Bibtex|Views617
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
In Section 4.3, we demonstrate that Context Adjustment can improve pseudo-marks by 2.0% mean Intersection over Union on average and overall achieves a new state-of-the-art by 66.1% mIoU on the val set and 66.7% mIoU on the test set of PASCAL VOC 2012, and 33.4% mIoU on the val se...

Abstract:

We present a causal inference framework to improve Weakly-Supervised Semantic Segmentation (WSSS). Specifically, we aim to generate better pixel-level pseudo-masks by using only image-level labels -- the most crucial step in WSSS. We attribute the cause of the ambiguous boundaries of pseudo-masks to the confounding context, e.g., the co...More
0
Introduction
  • Semantic segmentation aims to classify each image pixel into its corresponding semantic class [37].
  • It is an indispensable computer vision building block for scene understanding applications such as autonomous driv- Class Labels.
  • Segmentation Model expensive, e.g., it costs about 1.5 man-hours for one 500 × 500 daily life image [14].
  • The authors focus on the latter as it is the most economic way — only a few man-seconds for tagging an image [31]
Highlights
  • Semantic segmentation aims to classify each image pixel into its corresponding semantic class [37]
  • We propose a novel Weakly-Supervised Semantic Segmentation (WSSS) pipeline called: Context Adjustment (CONTA)
  • In Section 4.3, we demonstrate that CONTA can improve pseudo-marks by 2.0% mean Intersection over Union (mIoU) on average and overall achieves a new state-of-the-art by 66.1% mIoU on the val set and 66.7% mIoU on the test set of PASCAL VOC 2012 [14], and 33.4% mIoU on the val set of MS-COCO [35]
  • We evaluated three types of masks: Classification Activation Map (CAM) seed area mask, pseudo-mask, and segmentation mask, compared with the ground-truth mask
  • The detailed implementations of each baseline + CONTA are given in Appendix 3
  • We argued that the reasons are due to the context prior, which is a confounder in our proposed causal graph
Methods
  • SEC [26] + CONTA SEAM∗ [63] + CONTA IRNet∗ [1]

    Backbone VGG-16 VGG-16 ResNet-38 ResNet-38 ResNet-50

    Pseudo-Mask Seg.
  • SEC+CONTA VGG-16 23.7 achieves the very IRNet+CONTA ResNet-50 65.3 66.1 SEAM+CONTA ResNet-38 32.8 competitive 65.3% SEAM+CONTA ResNet-38 66.1 66.7 IRNet+CONTA ResNet-50 33.4 and 66.1% mIoU on the val set and the (a) PASCAL VOC 2012 [14].
  • ResNet-38 [67], CONTA on SEAM achieves state-of-the-art 66.1% and 66.7% mIoU on the val set and the test set, which surpasses the previous best model 1.2% and 1.0%, respectively.
  • On MS-COCO, CONTA deployed on SEC with VGG-16 [54] achieves 23.7% mIoU on the val set, which surpasses the previous best model by 1.3% mIoU.
  • On stronger backbones and WSSS models, CONTA can boost the performance by 0.9% mIoU on average
Results
  • The authors evaluated three types of masks: CAM seed area mask, pseudo-mask, and segmentation mask, compared with the ground-truth mask.
  • The standard mean Intersection over Union was used on the training set for evaluating CAM seed area mask and pseudo-mask, and on the val and test sets for evaluating segmentation mask.
  • General architecture components include a multi-label image classification model, a pseudo-mask generation model, and a segmentation model: DeepLab-v2 [9].
  • The detailed implementations of each baseline + CONTA are given in Appendix 3
Conclusion
  • The authors started from summarizing the three basic problems in existing pseudo-masks of WSSS.
  • The authors used causal intervention to remove the confounder.
  • As it is unobserved, the authors devised a novel WSSS framework: Context Adjustment (CONTA), based on the backdoor adjustment.
  • Thanks to the causal inference framework, the authors clearly know the limitations of CONTA: the approximation of the context confounder, which is proven to be ill-posed [11].
  • As moving forward, the authors are going to 1) develop more advanced confounder set discovery methods and 2) incorporate observable expert knowledge into the confounder
Summary
  • Introduction:

    Semantic segmentation aims to classify each image pixel into its corresponding semantic class [37].
  • It is an indispensable computer vision building block for scene understanding applications such as autonomous driv- Class Labels.
  • Segmentation Model expensive, e.g., it costs about 1.5 man-hours for one 500 × 500 daily life image [14].
  • The authors focus on the latter as it is the most economic way — only a few man-seconds for tagging an image [31]
  • Objectives:

    The authors aim to maximize P (Y ∣do(X)) for learning the multi-label classification model, whereby the subsequent CAM will yield better seed areas
  • Methods:

    SEC [26] + CONTA SEAM∗ [63] + CONTA IRNet∗ [1]

    Backbone VGG-16 VGG-16 ResNet-38 ResNet-38 ResNet-50

    Pseudo-Mask Seg.
  • SEC+CONTA VGG-16 23.7 achieves the very IRNet+CONTA ResNet-50 65.3 66.1 SEAM+CONTA ResNet-38 32.8 competitive 65.3% SEAM+CONTA ResNet-38 66.1 66.7 IRNet+CONTA ResNet-50 33.4 and 66.1% mIoU on the val set and the (a) PASCAL VOC 2012 [14].
  • ResNet-38 [67], CONTA on SEAM achieves state-of-the-art 66.1% and 66.7% mIoU on the val set and the test set, which surpasses the previous best model 1.2% and 1.0%, respectively.
  • On MS-COCO, CONTA deployed on SEC with VGG-16 [54] achieves 23.7% mIoU on the val set, which surpasses the previous best model by 1.3% mIoU.
  • On stronger backbones and WSSS models, CONTA can boost the performance by 0.9% mIoU on average
  • Results:

    The authors evaluated three types of masks: CAM seed area mask, pseudo-mask, and segmentation mask, compared with the ground-truth mask.
  • The standard mean Intersection over Union was used on the training set for evaluating CAM seed area mask and pseudo-mask, and on the val and test sets for evaluating segmentation mask.
  • General architecture components include a multi-label image classification model, a pseudo-mask generation model, and a segmentation model: DeepLab-v2 [9].
  • The detailed implementations of each baseline + CONTA are given in Appendix 3
  • Conclusion:

    The authors started from summarizing the three basic problems in existing pseudo-masks of WSSS.
  • The authors used causal intervention to remove the confounder.
  • As it is unobserved, the authors devised a novel WSSS framework: Context Adjustment (CONTA), based on the backdoor adjustment.
  • Thanks to the causal inference framework, the authors clearly know the limitations of CONTA: the approximation of the context confounder, which is proven to be ill-posed [11].
  • As moving forward, the authors are going to 1) develop more advanced confounder set discovery methods and 2) incorporate observable expert knowledge into the confounder
Tables
  • Table1: Ablation results on PASCAL VOC 2012 [<a class="ref-link" id="c14" href="#r14">14</a>] in mIoU (%). “*” denotes our re-implemented results. “Seg. Mask” refers to the segmentation mask on the val set. “–” denotes that it is N.A. for the fully-supervised models
  • Table2: Different baselines+CONTA on PASCAL VOC
  • Table3: Comparison with state-of-the-arts in mIoU (%). “*” denotes our stronger backbone re-implemented results. The best and second best performance under each
  • Table4: Ablations of IRNet [<a class="ref-link" id="c1" href="#r1">1</a>]+CONTA on PASCAL VOC 2012 [<a class="ref-link" id="c14" href="#r14">14</a>] in mIoU (%). “*” denotes our re-implemented results. “Seg. Mask” refers to the segmentation mask of the val set. “–” denotes that the result is N.A. for the fully-supervised model
  • Table5: Ablations of DSRG [<a class="ref-link" id="c22" href="#r22">22</a>]+CONTA on PASCAL VOC 2012 [<a class="ref-link" id="c14" href="#r14">14</a>] in mIoU (%). “*” denotes our re-implemented results. “Seg. Mask” refers to the segmentation mask of the val set. “–” denotes that the result is N.A. for the fully-supervised model
  • Table6: Ablations of SEC [<a class="ref-link" id="c26" href="#r26">26</a>]+CONTA on PASCAL VOC 2012 [<a class="ref-link" id="c14" href="#r14">14</a>] in mIoU (%). “*” denotes our re-implemented results. “Seg. Mask” refers to the segmentation mask of the val set. “–” denotes that the result is N.A. for the fully-supervised model
  • Table7: Ablation results of SEAM [<a class="ref-link" id="c63" href="#r63">63</a>]+CONTA on MS-COCO [<a class="ref-link" id="c35" href="#r35">35</a>] in mIoU (%). “*” denotes our re-implemented results. “Seg. Mask” refers to the segmentation mask of the val set. “–” denotes that the result is N.A. for the fully-supervised model
  • Table8: Ablation results of IRNet [<a class="ref-link" id="c1" href="#r1">1</a>]+CONTA on MS-COCO [<a class="ref-link" id="c35" href="#r35">35</a>] in mIoU (%). “*” denotes our re-implemented results. “Seg. Mask” refers to the segmentation mask of the val set. “–” denotes that the result is N.A. for the fully-supervised model
  • Table9: Ablation results of DSRG [<a class="ref-link" id="c22" href="#r22">22</a>]+CONTA on MS-COCO [<a class="ref-link" id="c35" href="#r35">35</a>] in mIoU (%). “*” denotes our re-implemented results. “Seg. Mask” refers to the segmentation mask of the val set. “–” denotes that the result is N.A. for the fully-supervised model
  • Table10: Ablation results of SEC [<a class="ref-link" id="c26" href="#r26">26</a>]+CONTA on MS-COCO [<a class="ref-link" id="c35" href="#r35">35</a>] in mIoU (%). “*” denotes our re-implemented results. “Seg. Mask” refers to the segmentation mask of the val set. “–” denotes that the result is N.A. for the fully-supervised model
Download tables as Excel
Related work
  • Weakly-Supervised Semantic Segmentation (WSSS). To address the problem of expensive labeling cost in fully-supervised semantic segmentation, WSSS has been extensively studied in recent years [1, 65]. As shown in Figure 1, the prevailing WSSS pipeline [26] with only the image-level class labels [2, 63] mainly consists of the following two steps: pseudo-mask generation and segmentation model training. The key is to generate the pseudo-masks as perfect as possible, where the “perfect” means that the pseudo-mask can reveal the entire object areas with accurate boundaries [1]. To this end, existing methods mainly focus on generating better seed areas [30, 63, 65, 64] and expanding these seed areas [1, 2, 22, 26, 61]. In this paper, we also follow this pipeline and our contribution is to propose an iterative procedure to generate high-quality seed areas.
Funding
  • This work was partially supported by the National Key Research and Development Program of China under Grant 2018AAA0102002, the National Natural Science Foundation of China under Grant 61925204, the China Scholarships Council under Grant 201806840058, the Alibaba Innovative Research (AIR) programme, and the NTU-Alibaba JRI
Reference
  • Jiwoon Ahn, Sunghyun Cho, and Suha Kwak. Weakly supervised learning of instance segmentation with inter-pixel relations. In CVPR, 2019. 2, 3, 5, 6, 7, 8, 18, 20, 22
    Google ScholarLocate open access versionFindings
  • Jiwoon Ahn and Suha Kwak. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In CVPR, 2018. 3, 5, 8, 18, 19
    Google ScholarLocate open access versionFindings
  • Peter C Austin. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3):399–424, 2011. 3
    Google ScholarLocate open access versionFindings
  • Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. TPAMI, 39(12):2481–2495, 2017. 8
    Google ScholarLocate open access versionFindings
  • Pierre Baldi and Peter Sadowski. The dropout learning algorithm. Artificial Intelligence, 210:78–122, 2014. 18
    Google ScholarLocate open access versionFindings
  • Elias Bareinboim and Judea Pearl. Controlling selection bias in causal inference. In Artificial Intelligence and Statistics, 2012. 3
    Google ScholarLocate open access versionFindings
  • Michel Besserve, Rémy Sun, and Bernhard Schölkopf. Counterfactuals uncover the modular structure of deep generative models. In ICLR, 2020. 3
    Google ScholarFindings
  • Thomas C Chalmers, Harry Smith Jr, Bradley Blackburn, Bernard Silverman, Biruta Schroeder, Dinah Reitman, and Alexander Ambroz. A method for assessing the quality of a randomized control trial. Controlled Clinical Trials, 2(1):31–49, 1981. 2
    Google ScholarLocate open access versionFindings
  • Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 40(4):834–848, 2017. 1, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Jifeng Dai, Kaiming He, and Jian Sun. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In ICCV, 2015. 1
    Google ScholarLocate open access versionFindings
  • Alexander D’Amour. On multi-cause causal inference with unobserved confounding: Counterexamples, impossibility, and alternatives. In AISTATS, 2019. 9
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 18, 19
    Google ScholarLocate open access versionFindings
  • Nikita Dvornik, Julien Mairal, and Cordelia Schmid. Modeling visual context is key to augmenting object detection datasets. In ECCV, 2018. 2, 3
    Google ScholarLocate open access versionFindings
  • Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes challenge: A retrospective. IJCV, 111(1):98–136, 2015. 1, 2, 3, 6, 7, 8, 20, 21, 24
    Google ScholarLocate open access versionFindings
  • Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 32(9):1627–1645, 2009. 4
    Google ScholarLocate open access versionFindings
  • Ross Girshick. Fast r-cnn. In ICCV, 2015. 4
    Google ScholarFindings
  • Ross Girshick, Forrest Iandola, Trevor Darrell, and Jitendra Malik. Deformable part models are convolutional neural networks. In CVPR, 2015. 4
    Google ScholarLocate open access versionFindings
  • Ruocheng Guo, Lu Cheng, Jundong Li, P Richard Hahn, and Huan Liu. A survey of learning causality with data: Problems and methods. CSUR, 53(4):1–37, 2020. 3
    Google ScholarLocate open access versionFindings
  • Bharath Hariharan, Pablo Arbeláez, Lubomir Bourdev, Subhransu Maji, and Jitendra Malik. Semantic contours from inverse detectors. In ICCV, 2011. 6
    Google ScholarLocate open access versionFindings
  • Mohammad Havaei, Axel Davy, David Warde-Farley, Antoine Biard, Aaron Courville, Yoshua Bengio, Chris Pal, Pierre-Marc Jodoin, and Hugo Larochelle. Brain tumor segmentation with deep neural networks. Medical Image Analysis, 35(3):18–31, 2017. 1
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. 8, 19
    Google ScholarLocate open access versionFindings
  • Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu, and Jingdong Wang. Weaklysupervised semantic segmentation network with deep seeded region growing. In CVPR, 2018. 1, 2, 3, 6, 7, 18, 20, 21, 23
    Google ScholarLocate open access versionFindings
  • Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, and Shipeng Li. Salient object detection: A discriminative regional feature integration approach. In CVPR, 2013. 6, 19
    Google ScholarLocate open access versionFindings
  • Justin Johnson, Agrim Gupta, and Li Fei-Fei. Image generation from scene graphs. In CVPR, 2018. 4
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015. 18
    Google ScholarLocate open access versionFindings
  • Alexander Kolesnikov and Christoph H Lampert. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In ECCV, 2016. 1, 3, 5, 6, 7, 8, 18, 20, 21, 23
    Google ScholarLocate open access versionFindings
  • Philipp Krähenbühl and Vladlen Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In NeurIPS, 2011. 18, 19, 20
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012. 18, 19
    Google ScholarLocate open access versionFindings
  • Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015. 4
    Google ScholarLocate open access versionFindings
  • Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, and Sungroh Yoon. Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In CVPR, 2019. 3
    Google ScholarLocate open access versionFindings
  • Qizhu Li, Anurag Arnab, and Philip HS Torr. Weakly-and semi-supervised panoptic segmentation. In ECCV, 2018. 1
    Google ScholarLocate open access versionFindings
  • Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, and Jian Sun. Learning dynamic routing for semantic segmentation. In CVPR, 2020. 5
    Google ScholarLocate open access versionFindings
  • Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, and Jian Sun. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In CVPR, 2016. 1
    Google ScholarLocate open access versionFindings
  • Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. In ICLR, 2014. 5
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014. 3, 6, 8, 20, 22, 23
    Google ScholarLocate open access versionFindings
  • Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In ECCV, 2016. 4
    Google ScholarLocate open access versionFindings
  • Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015. 1, 6, 20, 21, 22, 23
    Google ScholarLocate open access versionFindings
  • David Marr. Vision: A computational investigation into the human representation and processing of visual information. MIT Press, 1982. 4, 5
    Google ScholarFindings
  • Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, and Ji-Rong Wen. Counterfactual vqa: A cause-effect look at language bias. In arXiv, 2020. 3
    Google ScholarLocate open access versionFindings
  • Giambattista Parascandolo, Niki Kilbertus, Mateo Rojas-Carulla, and Bernhard Schölkopf. Learning independent causal mechanisms. In ICML, 2018. 3
    Google ScholarLocate open access versionFindings
  • Judea Pearl. Causality: Models, Reasoning and Inference. Springer, 2000. 2, 3, 17
    Google ScholarFindings
  • Judea Pearl. Interpretation and identification of causal mediation. Psychological Methods, 19(4):459–481, 2014. 2
    Google ScholarLocate open access versionFindings
  • Judea Pearl et al. Causal inference in statistics: An overview. Statistics surveys, 3:96–146, 2009. 3
    Google ScholarLocate open access versionFindings
  • Judea Pearl, Madelyn Glymour, and Nicholas P Jewell. Causal inference in statistics: A primer. John Wiley & Sons, 2016. 2, 3, 4, 17
    Google ScholarFindings
  • Jiaxin Qi, Yulei Niu, Jianqiang Huang, and Hanwang Zhang. Two causal principles for improving visual dialog. In CVPR, 2020. 3
    Google ScholarLocate open access versionFindings
  • Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015. 8
    Google ScholarLocate open access versionFindings
  • Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005. 3
    Google ScholarLocate open access versionFindings
  • Donald B Rubin. Essential concepts of causal inference: a remarkable history and an intriguing future. Biostatistics & Epidemiology, 3(1):140–155, 2019. 3
    Google ScholarLocate open access versionFindings
  • Ethan M Rudd, Manuel Günther, and Terrance E Boult. Moon: A mixed objective optimization network for the recognition of facial attributes. In ECCV, 2016. 5
    Google ScholarLocate open access versionFindings
  • Fatemehsadat Saleh, Mohammad Sadegh Aliakbarian, Mathieu Salzmann, Lars Petersson, Stephen Gould, and Jose M Alvarez. Built-in foreground/background prior for weaklysupervised semantic segmentation. In ECCV, 2016. 3, 8
    Google ScholarLocate open access versionFindings
  • Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and Joris Mooij. On causal and anticausal learning. In ICML, 2012. 3
    Google ScholarLocate open access versionFindings
  • Wataru Shimoda and Keiji Yanai. Self-supervised difference detection for weakly-supervised semantic segmentation. In ICCV, 2019. 8
    Google ScholarLocate open access versionFindings
  • Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training region-based object detectors with online hard example mining. In CVPR, 2016. 18
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 8, 19
    Google ScholarLocate open access versionFindings
  • Guolei Sun, Wenguan Wang, Jifeng Dai, and Luc Van Gool. Mining cross-image semantics for weakly supervised semantic segmentation. In ECCV, 2020. 3
    Google ScholarLocate open access versionFindings
  • Raphael Suter, Djordje Miladinovic, Bernhard Schölkopf, and Stefan Bauer. Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness. In ICML, 2019. 3
    Google ScholarLocate open access versionFindings
  • Kaihua Tang, Jianqiang Huang, and Hanwang Zhang. Long-tailed classification by keeping the good and removing the bad momentum causal effect. In NeurIPS, 2020. 3
    Google ScholarLocate open access versionFindings
  • Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. Unbiased scene graph generation from biased training. In CVPR, 2020. 3
    Google ScholarLocate open access versionFindings
  • Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, and Wei Liu. Learning to compose dynamic tree structures for visual contexts. In CVPR, 2019. 3
    Google ScholarLocate open access versionFindings
  • Michael Treml, José Arjona-Medina, Thomas Unterthiner, Rupesh Durgesh, Felix Friedmann, Peter Schuberth, Andreas Mayr, Martin Heusel, Markus Hofmarcher, Michael Widrich, et al. Speeding up semantic segmentation for autonomous driving. In NeurIPS, 2016. 1
    Google ScholarLocate open access versionFindings
  • Paul Vernaza and Manmohan Chandraker. Learning random-walk label propagation for weaklysupervised semantic segmentation. In CVPR, 2017. 3
    Google ScholarLocate open access versionFindings
  • Tan Wang, Jianqiang Huang, Hanwang Zhang, and Qianru Sun. Visual commonsense r-cnn. In CVPR, 2020. 3, 6
    Google ScholarLocate open access versionFindings
  • Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, and Xilin Chen. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In CVPR, 2020. 1, 2, 3, 5, 6, 7, 8, 18, 20, 22, 24
    Google ScholarLocate open access versionFindings
  • Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In CVPR, 2017. 3
    Google ScholarLocate open access versionFindings
  • Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, and Thomas S Huang. Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In CVPR, 2018. 1, 3
    Google ScholarLocate open access versionFindings
  • CF Jeff Wu. On the convergence properties of the em algorithm. The Annals of Statistics, 1(1):95–103, 1983. 5
    Google ScholarLocate open access versionFindings
  • Zifeng Wu, Chunhua Shen, and Anton Van Den Hengel. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90(1):119–133, 2019. 8, 18
    Google ScholarLocate open access versionFindings
  • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015. 5, 18
    Google ScholarLocate open access versionFindings
  • Xu Yang, Hanwang Zhang, and Jianfei Cai. Deconfounded image captioning: A causal retrospect. In arXiv, 2020. 3
    Google ScholarLocate open access versionFindings
  • Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016. 3, 18, 19
    Google ScholarLocate open access versionFindings
  • Zhongqi Yue, Hanwang Zhang, Qianru Sun, and Xiansheng Hua. Interventional few-shot learning. In NeurIPS, 2020. 3
    Google ScholarLocate open access versionFindings
  • Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. Neural motifs: Scene graph parsing with global context. In CVPR, 2018. 4
    Google ScholarLocate open access versionFindings
  • Bingfeng Zhang, Jimin Xiao, Yunchao Wei, Mingjie Sun, and Kaizhu Huang. Reliability does matter: An end-to-end weakly supervised semantic segmentation approach. In AAAI, 2020. 8
    Google ScholarLocate open access versionFindings
  • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In CVPR, 2016. 1, 4, 5, 19, 20
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments