Context Contrasted Feature and Gated Multi-Scale Aggregation for Scene Segmentation

CVPR, pp. 2393-2402, 2018.

Cited by: 148|Bibtex|Views96
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
Deep Convolutional Neural Networks designed for image classification tends to extract abstract features of dominated objects, some essentially discriminative information for inconspicuous objects and stuff are weakened or even disregarded

Abstract:

Scene segmentation is a challenging task as it need label every pixel in the image. It is crucial to exploit discriminative context and aggregate multi-scale features to achieve better segmentation. In this paper, we first propose a novel context contrasted local feature that not only leverages the informative context but also spotlights ...More

Code:

Data:

0
Introduction
  • Scene segmentation has been an essential component of image understanding and is in intensely demand for automation devices, virtual reality, self-driving vehicles and etc.
  • The goal of scene segmentation is parsing a scene image into a set of coherent semantic regions and labeling each pixel to one of classes including objects and stuff.
  • It implicitly involves image classification, object localization and boundary delineation.
  • The authors mainly consider two handicaps when applying DCNN on dense prediction tasks: the various forms of objects/stuff and the existence of multi-scale objects
Highlights
  • Scene segmentation has been an essential component of image understanding and is in intensely demand for automation devices, virtual reality, self-driving vehicles and etc
  • We propose a novel context contrasted local feature which is tailored for scene segmentation and propose a context contrasted local (CCL) model to obtain multiscale and multi-level context contrasted local features
  • We propose to generate local information and context separately and fuse them via making contrast between these two: CL = Fl(F, Θl) − Fc(F, Θc) where F is the input features, Fl is the function of local Conv, Fc is the function of context Conv, Θl and Θc are respective parameters, and CL is the desired context contrasted local features
  • Deep Convolutional Neural Networks (DCNN) designed for image classification tends to extract abstract features of dominated objects, some essentially discriminative information for inconspicuous objects and stuff are weakened or even disregarded
  • We propose a novel context contrasted local feature to leverage the useful context and spotlight the local information in contrast to the context
  • We achieve new state-of-the-art performance consistently on the three public scene segmentation benchmarks, Pascal Context, SUN-RGBD and COCO Stuff
  • Our segmentation network achieves state-ofthe-arts consistently on the 3 popular scene segmentation datasets used in the evaluation, Pascal Context, SUNRGBD and COCO Stuff
Methods
  • O2P[6] CFM [11] FCN-8s [50] CRF-RNN [64] ParseNet [35] BoxSup [10] ConvPP-8 [60] HO-CRF [1] PixelNet [3] Context-CRF [30] DAG-RNN + CRF [51] FCRN [58] DeepLab-v2+CRF†[8] Hu et al.[23] Global-Context[25] RefineNet-Res101 [29] RefineNet-Res152 [29] PSPNet-Res101 [63]

    Liu et al [32] Ren et al [47] FCN-8s [38] DeconvNet [42] Kendall et al [27] SegNet [2] DeepLab [8] Context-CRF [30] RefineNet-Res101 [29] RefineNet-Res152 [29]

    SUN3D [59], NYUDv2 [52], Berkeley B3DO [26] and the newly captured images.
  • Quantitative results of SUN-RGBD are reported in Table 5.
  • COCO Stuff [5] contains 10000 images from Microsoft COCO dataset [31], out of which 9000 images are for training and 1000 images for testing.
  • The unlabeled stuff pixels in original images of Microsoft COCO are further annotated with additional 91 classes in COCO Stuff
  • This dataset contains 171 categories including objects and stuff annotated to each pixel.
Results
  • Pascal Context [41] contains 10103 images from Pascal VOC 2010, and these images are re-annotated as pixel-wise segmentation maps.
  • The authors' segmentation network performers better at global information, salient objects, stuff and inconspicuous objects and has a robust adaptability to multi-scale objects.
  • Quantitative results of Pascal Context are shown in Table 4.
  • It shows that the segmentation network outperforms the state-of-the-arts by a large margin for all the three evaluation metrics
Conclusion
  • DCNN designed for image classification tends to extract abstract features of dominated objects, some essentially discriminative information for inconspicuous objects and stuff are weakened or even disregarded.
  • To address this issue, the authors propose a novel context contrasted local feature to leverage the useful context and spotlight the local information in contrast to the context.
  • The authors' segmentation network achieves state-ofthe-arts consistently on the 3 popular scene segmentation datasets used in the evaluation, Pascal Context, SUNRGBD and COCO Stuff
Summary
  • Introduction:

    Scene segmentation has been an essential component of image understanding and is in intensely demand for automation devices, virtual reality, self-driving vehicles and etc.
  • The goal of scene segmentation is parsing a scene image into a set of coherent semantic regions and labeling each pixel to one of classes including objects and stuff.
  • It implicitly involves image classification, object localization and boundary delineation.
  • The authors mainly consider two handicaps when applying DCNN on dense prediction tasks: the various forms of objects/stuff and the existence of multi-scale objects
  • Methods:

    O2P[6] CFM [11] FCN-8s [50] CRF-RNN [64] ParseNet [35] BoxSup [10] ConvPP-8 [60] HO-CRF [1] PixelNet [3] Context-CRF [30] DAG-RNN + CRF [51] FCRN [58] DeepLab-v2+CRF†[8] Hu et al.[23] Global-Context[25] RefineNet-Res101 [29] RefineNet-Res152 [29] PSPNet-Res101 [63]

    Liu et al [32] Ren et al [47] FCN-8s [38] DeconvNet [42] Kendall et al [27] SegNet [2] DeepLab [8] Context-CRF [30] RefineNet-Res101 [29] RefineNet-Res152 [29]

    SUN3D [59], NYUDv2 [52], Berkeley B3DO [26] and the newly captured images.
  • Quantitative results of SUN-RGBD are reported in Table 5.
  • COCO Stuff [5] contains 10000 images from Microsoft COCO dataset [31], out of which 9000 images are for training and 1000 images for testing.
  • The unlabeled stuff pixels in original images of Microsoft COCO are further annotated with additional 91 classes in COCO Stuff
  • This dataset contains 171 categories including objects and stuff annotated to each pixel.
  • Results:

    Pascal Context [41] contains 10103 images from Pascal VOC 2010, and these images are re-annotated as pixel-wise segmentation maps.
  • The authors' segmentation network performers better at global information, salient objects, stuff and inconspicuous objects and has a robust adaptability to multi-scale objects.
  • Quantitative results of Pascal Context are shown in Table 4.
  • It shows that the segmentation network outperforms the state-of-the-arts by a large margin for all the three evaluation metrics
  • Conclusion:

    DCNN designed for image classification tends to extract abstract features of dominated objects, some essentially discriminative information for inconspicuous objects and stuff are weakened or even disregarded.
  • To address this issue, the authors propose a novel context contrasted local feature to leverage the useful context and spotlight the local information in contrast to the context.
  • The authors' segmentation network achieves state-ofthe-arts consistently on the 3 popular scene segmentation datasets used in the evaluation, Pascal Context, SUNRGBD and COCO Stuff
Tables
  • Table1: Segmentation networks are adapted to encode-decode architecture with rich skip layers, the stride rates (dilation factors) of the four branches in ASPP are revised to {1, 3, 4, 6} respectively. For fair comparisons, gated sum is not adopted, and they only differentiate each other in terms of context aggregation (CA)
  • Table2: Ablation experiments of CCL on Pascal Context. LA is local aggregation generated by removing the context part of CCL. LAd doubles the hidden dimensionality of LA from 512 to 1024, thus its parameter quantity is the same as CCL. Other settings are all the same
  • Table3: Ablation experiments of Gated Sum on Pascal Context
  • Table4: Pascal Context testing accuracies. Our network outperforms all existing methods by a large margin across all evaluation metrics. Methods trained with extra data are marked with †
  • Table5: SUN-RGBD (37 classes) segmentation results. We do not use the depth information for training. Our segmentation network outperforms existing methods consistently across all the three evaluation metrics
  • Table6: Parsing performance of different networks on COCO Stuff dataset. Our segmentation network outperforms the stateof-the-arts by a large margin across all evaluation metrics
Download tables as Excel
Related work
  • 2.1. Contextual Modeling

    One direction is to apply new layers to enhance highlevel contextual aggregation. For example, Chen et al [8] introduced an atrous spatial pyramid pooling (ASPP) to capture useful context information at multiple scales. Visin et al.[56], Shuai et al [51] and Byeon et al.[4] adopted recurrent neural networks to capture long-range context. Zhao et al.[63] employed multiple pooling to exploit global information from different regions. Liu et al [36] proposed to model the mean field algorithm with local convolution layers and incorporate it in deep parsing network (DPN). Yu et al [61] attached multiple dilated convolution layers after class likelihood maps to exercise multi-scale context aggregation. Another way is to use Conditional Random Fields (CRF) [28] to model the context of score maps [7, 8, 64, 30, 36]. For example, Chen et al [8] adopted CRF to post-process the unary predictions. Zheng et al [64] proposed CRF-RNN to jointly train CRF with their segmentation networks.
Funding
  • This research was supported by Singapore Ministry of Education Academic Research Fund Tier 1 RG 123/15
  • The ROSE Lab is supported by the National Research Foundation, Singapore, under its Interactive Digital Media (IDM) Strategic Research Programme
Study subjects and analysis
popular scene segmentation datasets: 3
Their values are generated from the testing image by the proposed network learnt from the training data so that they are adaptive not only to the training data, but also to the specific testing image. Without bells and whistles, the proposed approach achieves the state-of-the-arts consistently on the three popular scene segmentation datasets, Pascal Context, SUN-RGBD and COCO Stuff. O2P[6] CFM [11] FCN-8s [50] CRF-RNN [64] ParseNet [35] BoxSup [10] ConvPP-8 [60] HO-CRF [1] PixelNet [3] Context-CRF [30] DAG-RNN + CRF [51] FCRN [58] DeepLab-v2+CRF†[8] Hu et al.[23] Global-Context[25] RefineNet-Res101 [29] RefineNet-Res152 [29] PSPNet-Res101 [63]

Liu et al [32] Ren et al [47] FCN-8s [38] DeconvNet [42] Kendall et al [27] SegNet [2] DeepLab [8] Context-CRF [30] RefineNet-Res101 [29] RefineNet-Res152 [29]

SUN3D [59], NYUDv2 [52], Berkeley B3DO [26] and the newly captured images

popular scene segmentation datasets: 3
Their values are generated from the testing image by the proposed network learnt from the training data so that they are adaptive not only to the training data, but also to the specific testing image. Without bells and whistles, the proposed approach achieves the state-of-the-arts consistently on the three popular scene segmentation datasets, Pascal Context, SUN-RGBD and COCO Stuff. Scene segmentation has been an essential component of image understanding and is in intensely demand for automation devices, virtual reality, self-driving vehicles and etc

public scene segmentation datasets: 3
The gate Gnp adjusts its value adaptive to the testing input features to control the information flow of skip layers. We evaluate our segmentation framework on 3 public scene segmentation datasets, Pascal Context, SUN-RGBD and COCO Stuff. 4.1

popular scene segmentation datasets: 3
Thus, they are adaptive not only to the training data, but also to the specific testing image. Without bells and whistles, our segmentation network achieves state-ofthe-arts consistently on the 3 popular scene segmentation datasets used in the evaluation, Pascal Context, SUNRGBD and COCO Stuff. This research was carried out at the Rapid-Rich Object Search (ROSE) Lab at the Nanyang Technological University, Singapore

Reference
  • A. Arnab, S. Jayasumana, S. Zheng, and P. H. Torr. Higher order conditional random fields in deep neural networks. In European Conference on Computer Vision. Springer, 2016.
    Google ScholarLocate open access versionFindings
  • V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561, 2015.
    Findings
  • A. Bansal, X. Chen, B. Russell, A. Gupta, and D. Ramanan. Pixelnet: Towards a general pixel-level architecture. arXiv preprint arXiv:1609.06694, 2016.
    Findings
  • W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki. Scene labeling with lstm recurrent neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
    Google ScholarLocate open access versionFindings
  • H. Caesar, J. Uijlings, and V. Ferrari. Coco-stuff: Thing and stuff classes in context. arXiv preprint arXiv:1612.03716, 2016.
    Findings
  • J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu. Semantic segmentation with second-order pooling. Computer Vision–ECCV 2012, 2012.
    Google ScholarLocate open access versionFindings
  • L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915, 2016.
    Findings
  • L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
    Google ScholarLocate open access versionFindings
  • J. Dai, K. He, and J. Sun. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, 2015.
    Google ScholarLocate open access versionFindings
  • J. Dai, K. He, and J. Sun. Convolutional feature masking for joint object and stuff segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
    Google ScholarLocate open access versionFindings
  • C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. IEEE transactions on pattern analysis and machine intelligence, 35(8):1915– 1929, 2013.
    Google ScholarLocate open access versionFindings
  • R. Garland-Thomson. Staring: How we look. Oxford University Press, 2009.
    Google ScholarFindings
  • G. Ghiasi and C. C. Fowlkes. Laplacian pyramid reconstruction and refinement for semantic segmentation. In European Conference on Computer Vision. Springer, 2016.
    Google ScholarLocate open access versionFindings
  • R. Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, 2015.
    Google ScholarLocate open access versionFindings
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2014.
    Google ScholarLocate open access versionFindings
  • [18] A. Graves, A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pages 6645–6649. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • [19] J. Gu, G. Wang, J. Cai, and T. Chen. An empirical study of language cnn for image captioning. In Proceedings of the International Conference on Computer Vision (ICCV), 2017.
    Google ScholarLocate open access versionFindings
  • [20] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, et al. Recent advances in convolutional neural networks. Pattern Recognition, 2017.
    Google ScholarLocate open access versionFindings
  • [21] B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Hypercolumns for object segmentation and fine-grained localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
    Google ScholarLocate open access versionFindings
  • [22] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
    Google ScholarLocate open access versionFindings
  • [23] H. Hu, Z. Deng, G.-T. Zhou, F. Sha, and G. Mori. Labelbank: Revisiting global perspectives for semantic segmentation. arXiv preprint arXiv:1703.09891, 2017.
    Findings
  • [24] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
    Google ScholarLocate open access versionFindings
  • [25] W.-C. Hung, Y.-H. Tsai, X. Shen, Z. Lin, K. Sunkavalli, X. Lu, and M.-H. Yang. Scene parsing with global context embedding. arXiv preprint arXiv:1710.06507, 2017.
    Findings
  • [26] A. Janoch, S. Karayev, Y. Jia, J. T. Barron, M. Fritz, K. Saenko, and T. Darrell. A category-level 3d object dataset: Putting the kinect to work. In Consumer Depth Cameras for Computer Vision. Springer, 2013.
    Google ScholarFindings
  • [27] A. Kendall, V. Badrinarayanan, and R. Cipolla. Bayesian segnet: Model uncertainty in deep convolutional encoderdecoder architectures for scene understanding. arXiv preprint arXiv:1511.02680, 2015.
    Findings
  • [28] P. Krahenbuhl and V. Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In Advances in neural information processing systems, pages 109–117, 2011.
    Google ScholarLocate open access versionFindings
  • [29] G. Lin, A. Milan, C. Shen, and I. Reid. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
    Google ScholarLocate open access versionFindings
  • [30] G. Lin, C. Shen, A. van dan Hengel, and I. Reid. Efficient piecewise training of deep structured models for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • [31] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 2014.
    Google ScholarLocate open access versionFindings
  • [32] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. IEEE transactions on pattern analysis and machine intelligence, 33(5):978–994, 2011.
    Google ScholarLocate open access versionFindings
  • [33] J. Liu, A. Shahroudy, G. Wang, L.-Y. Duan, and A. C. Kot. Ssnet: Scale selection network for online 3d action prediction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • [34] S. Liu, X. Qi, J. Shi, H. Zhang, and J. Jia. Multi-scale patch aggregation (mpa) for simultaneous detection and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
    Google ScholarLocate open access versionFindings
  • [35] W. Liu, A. Rabinovich, and A. C. Berg. Parsenet: Looking wider to see better. arXiv preprint arXiv:1506.04579, 2015.
    Findings
  • [36] Z. Liu, X. Li, P. Luo, C.-C. Loy, and X. Tang. Semantic image segmentation via deep parsing network. In Proceedings of the IEEE International Conference on Computer Vision, 2015.
    Google ScholarLocate open access versionFindings
  • [37] Z. Liu, G. Lin, S. Yang, J. Feng, W. Lin, and G. Wangling. Learning markov clustering networks for scene text detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
    Google ScholarLocate open access versionFindings
  • [38] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
    Google ScholarLocate open access versionFindings
  • [39] Z. Lu, X. Jiang, and A. C. Kot. Deep coupled resnet for lowresolution face recognition. IEEE Signal Processing Letters, 2018.
    Google ScholarLocate open access versionFindings
  • [40] M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich. Feedforward semantic segmentation with zoom-out features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
    Google ScholarLocate open access versionFindings
  • [41] R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille. The role of context for object detection and semantic segmentation in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
    Google ScholarLocate open access versionFindings
  • [42] H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, 2015.
    Google ScholarLocate open access versionFindings
  • [43] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun. Large kernel matters – improve semantic segmentation by global convolutional network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • [44] P. Pinheiro and R. Collobert. Recurrent convolutional neural networks for scene labeling. In International Conference on Machine Learning, 2014.
    Google ScholarLocate open access versionFindings
  • [45] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
    Google ScholarLocate open access versionFindings
  • [46] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
    Google ScholarLocate open access versionFindings
  • [47] X. Ren, L. Bo, and D. Fox. Rgb-(d) scene labeling: Features and algorithms. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • [48] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015.
    Google ScholarLocate open access versionFindings
  • [49] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 2015.
    Google ScholarLocate open access versionFindings
  • [50] E. Shelhamer, J. Long, and T. Darrell. Fully convolutional networks for semantic segmentation. IEEE transactions on pattern analysis and machine intelligence, 2016.
    Google ScholarFindings
  • [51] B. Shuai, Z. Zuo, B. Wang, and G. Wang. Scene segmentation with dag-recurrent neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
    Google ScholarLocate open access versionFindings
  • [52] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision. Springer, 2012.
    Google ScholarLocate open access versionFindings
  • [53] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
    Findings
  • [54] S. Song, S. P. Lichtenberg, and J. Xiao. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
    Google ScholarLocate open access versionFindings
  • [55] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
    Google ScholarLocate open access versionFindings
  • [56] F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, and A. Courville. Reseg: A recurrent neural network-based model for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2016.
    Google ScholarLocate open access versionFindings
  • [57] P. Wang, P. Chen, y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502, 2017.
    Findings
  • [58] Z. Wu, C. Shen, and A. v. d. Hengel. Bridging categorylevel and instance-level semantic image segmentation. arXiv preprint arXiv:1605.06885, 2016.
    Findings
  • [59] J. Xiao, A. Owens, and A. Torralba. Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proceedings of the IEEE International Conference on Computer Vision, 2013.
    Google ScholarLocate open access versionFindings
  • [60] S. Xie, X. Huang, and Z. Tu. Convolutional pseudo-prior for structured labeling. arXiv preprint arXiv:1511.07409, 2015.
    Findings
  • [61] F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
    Findings
  • [62] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 2014.
    Google ScholarLocate open access versionFindings
  • [63] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
    Google ScholarLocate open access versionFindings
  • [64] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, 2015.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments