Unsupervised Domain Adaptation for Semantic Segmentation by Content Transfer

Suhyeon Lee
Suhyeon Lee
Hongje Seong
Hongje Seong
Cited by: 0|Views25
Weibo:
All unsupervised domain adaptation methods have higher performance than “NonAdapt,” which learns the segmentation model using only the source dataset, and it suggests that domain adaptation is effective in semantic segmentation problem

Abstract:

In this paper, we tackle the unsupervised domain adaptation (UDA) for semantic segmentation, which aims to segment the unlabeled real data using labeled synthetic data. The main problem of UDA for semantic segmentation relies on reducing the domain gap between the real image and synthetic image. To solve this problem, we focused on sepa...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Semantic segmentation is a task of classifying each pixel of an image into semantic categories.
  • To reduce the labeling cost, the model learned with automatically labeled synthetic datasets (Richter et al 2016; Ros et al 2016) can be used in real-world environments (Cordts et al 2016)
  • In this case, the domain gap between synthetic data and real data causes the model to behave poorly in real-world environments, so unsupervised domain adaptation (UDA) methods have been proposed to solve this problem (Chang et al 2019; Du et al 2019; Lin et al 2017; Luo et al 2019).
Highlights
  • Semantic segmentation is a task of classifying each pixel of an image into semantic categories
  • unsupervised domain adaptation (UDA) for semantic segmentation aims to train a segmentation model that performs on the real domain by using unlabeled real images and labeled synthetic images
  • We propose the content transfer to solve the lack of data for target tail classes, which is the essential cause of the class imbalance problem
  • We use the mean of classwise Intersection-over-Union as an evaluation metric, and we measure the performance on the validation set of Cityscapes
  • All UDA methods have higher performance than “NonAdapt,” which learns the segmentation model using only the source dataset, and it suggests that domain adaptation is effective in semantic segmentation problem
  • Experimental results show that the proposed method achieves the state-of-the-art performance in semantic segmentation on the major two UDA settings
  • We proposed the zero-style loss that reduces the domain gap between the source data and target data by completely separating the content and style of the image
Methods
Results
  • The authors compare the method with existing state-of-the-art UDA methods.
  • All methods are trained using the labeled GTA5 or SYNTHIA as the source and the unlabeled Cityscapes training set as the target, and they are based on DeepLab v2 with the backbone ResNet-101.
  • All UDA methods have higher performance than “NonAdapt,” which learns the segmentation model using only the source dataset, and it suggests that domain adaptation is effective in semantic segmentation problem.
  • The authors' method keeps a balance between head and tail classes, and the authors achieve the state-of-the-art performance
Conclusion
  • The authors proposed the zero-style loss that reduces the domain gap between the source data and target data by completely separating the content and style of the image.
  • The zero-style loss aligns the content features of the two domains, and this leads to improved semantic segmentation performance in the target domain.
  • The authors' proposed content transfer enhances performance by solving the class imbalance problem in unsupervised domain adaptation
Summary
  • Introduction:

    Semantic segmentation is a task of classifying each pixel of an image into semantic categories.
  • To reduce the labeling cost, the model learned with automatically labeled synthetic datasets (Richter et al 2016; Ros et al 2016) can be used in real-world environments (Cordts et al 2016)
  • In this case, the domain gap between synthetic data and real data causes the model to behave poorly in real-world environments, so unsupervised domain adaptation (UDA) methods have been proposed to solve this problem (Chang et al 2019; Du et al 2019; Lin et al 2017; Luo et al 2019).
  • Objectives:

    The authors' goal is to improve performance on the target, so the authors train the model with the target pseudo label Yt assigned by maximum probability threshold (MPT) (Li, Yuan, and Vasconcelos 2019).
  • Methods:

    MIoU NonAdapt

    AdaptSegNet (Tsai et al 2018)

    CLAN (Luo et al 2019)

    MaxSquare (Chen, Xue, and Cai 2019)

    SSF-DAN (Du et al 2019)

    DISE (Chang et al 2019)

    AdvEnt+MinEnt (Vu et al 2019)

    APODA (Yang et al 2020)

    Patch Alignment (Tsai et al 2019)

    WeakSegDA (Paul et al 2020).
  • AdaptSegNet (Tsai et al 2018).
  • CLAN (Luo et al 2019).
  • MaxSquare (Chen, Xue, and Cai 2019).
  • SSF-DAN (Du et al 2019).
  • DISE (Chang et al 2019).
  • AdvEnt+MinEnt (Vu et al 2019).
  • APODA (Yang et al 2020).
  • Patch Alignment (Tsai et al 2019).
  • WeakSegDA (Paul et al 2020).
  • Sty Cyc ST In Out mIoU mIoUtail DISE.
  • Road sidewalk building wall fence pole traffic light traffic sign vegetation sky person rider car bus motorcycle bicycle.
  • Results:

    The authors compare the method with existing state-of-the-art UDA methods.
  • All methods are trained using the labeled GTA5 or SYNTHIA as the source and the unlabeled Cityscapes training set as the target, and they are based on DeepLab v2 with the backbone ResNet-101.
  • All UDA methods have higher performance than “NonAdapt,” which learns the segmentation model using only the source dataset, and it suggests that domain adaptation is effective in semantic segmentation problem.
  • The authors' method keeps a balance between head and tail classes, and the authors achieve the state-of-the-art performance
  • Conclusion:

    The authors proposed the zero-style loss that reduces the domain gap between the source data and target data by completely separating the content and style of the image.
  • The zero-style loss aligns the content features of the two domains, and this leads to improved semantic segmentation performance in the target domain.
  • The authors' proposed content transfer enhances performance by solving the class imbalance problem in unsupervised domain adaptation
Tables
  • Table1: Results of adapting GTA5 to Cityscapes. All methods use DeepLab v2 with the backbone ResNet-101 as the segmentation network. We measure the mIoU performance of the 19 classes in the evaluation set of Cityscapes
  • Table2: Results of adapting SYNTHIA to Cityscapes. All methods use DeepLab v2 with the backbone ResNet-101 as the segmentation network. We measure the mIoU of the 16 classes in the evaluation set of Cityscapes and the mIoU of the 13 classes (mIoU*), excluding classes with *
  • Table3: Ablation studies of adapting GTA5 to Cityscapes
  • Table4: Ablation studies of adapting SYNTHIA to Cityscapes
Download tables as Excel
Related work
  • Semantic Segmentation

    Semantic segmentation is a computer vision task that predicts category per pixel of image. Recently, convolutional neural network-based methods have been developed. Long, Shelhamer, and Darrell (2015) proposed Fully Convolutional Network (FCN) for spatially dense prediction. DeepLab (Chen et al 2017) proposed Atrous Spatial Pyramid Pooling (ASPP) to reliably segment on various scales. PSPNet (Zhao et al 2017) proposed pyramid pooling module (PPM) that utilizes global context information through different-region-based context aggregation. SegNet (Badrinarayanan, Kendall, and Cipolla 2017) and UNet (Ronneberger, Fischer, and Brox 2015) used a encoderdecoder architecture for mapping the low-resolution features to input-resolution.
Funding
  • This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2019R1A2C1007153)
Study subjects and analysis
pairs: 3
The segmentation model is based on DeepLab v2 with the backbone ResNet-101, and the output of the first convolutional layer is shared to capture the content and style features. For adversarial learning, discriminators and three pairs of encoder and decoder are trained alternately. Discriminators are learned to determine whether the pixellevel semantic prediction comes from the source domain or the target domain and to determine whether the generated source and target style images are real or fake

Reference
  • Badrinarayanan, V.; Kendall, A.; and Cipolla, R. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39(12): 2481–2495.
    Google ScholarLocate open access versionFindings
  • Benaim, S.; Khaitov, M.; Galanti, T.; and Wolf, L. 2019. Domain Intersection and Domain Difference. In Proceedings of the IEEE International Conference on Computer Vision, 3445–3453.
    Google ScholarLocate open access versionFindings
  • Chang, W.-L.; Wang, H.-P.; Peng, W.-H.; and Chiu, W.-C. 2019. All about structure: Adapting structural information across domains for boosting semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1900–1909.
    Google ScholarLocate open access versionFindings
  • Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; and Yuille, A. L. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40(4): 834–848.
    Google ScholarLocate open access versionFindings
  • Chen, M.; Xue, H.; and Cai, D. 2019. Domain adaptation for semantic segmentation with maximum squares loss. In Proceedings of the IEEE International Conference on Computer Vision, 2090–2099.
    Google ScholarLocate open access versionFindings
  • Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; and Schiele, B. 201The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3213–3223.
    Google ScholarLocate open access versionFindings
  • Cui, Y.; Jia, M.; Lin, T.-Y.; Song, Y.; and Belongie, S. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9268–9277.
    Google ScholarLocate open access versionFindings
  • Du, L.; Tan, J.; Yang, H.; Feng, J.; Xue, X.; Zheng, Q.; Ye, X.; and Zhang, X. 2019. SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation. In Proceedings of the IEEE International Conference on Computer Vision, 982–991.
    Google ScholarLocate open access versionFindings
  • Dwibedi, D.; Misra, I.; and Hebert, M. 2017.
    Google ScholarFindings
  • Cut, paste and learn: Surprisingly easy synthesis for instance detection. In Proceedings of the IEEE International Conference on Computer Vision, 1301–1310.
    Google ScholarLocate open access versionFindings
  • French, G.; Mackiewicz, M.; and Fisher, M. 2017. Selfensembling for visual domain adaptation. arXiv preprint arXiv:1706.05208.
    Findings
  • Gatys, L. A.; Ecker, A. S.; and Bethge, M. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2414–2423.
    Google ScholarLocate open access versionFindings
  • Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J.-Y.; Isola, P.; Saenko, K.; Efros, A. A.; and Darrell, T. 2017. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213.
    Findings
  • Isola, P.; Zhu, J.-Y.; Zhou, T.; and Efros, A. A. 2017. Imageto-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1125–1134.
    Google ScholarLocate open access versionFindings
  • Li, Y.; Yuan, L.; and Vasconcelos, N. 2019. Bidirectional learning for domain adaptation of semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6936–6945.
    Google ScholarLocate open access versionFindings
  • Lin, G.; Milan, A.; Shen, C.; and Reid, I. 2017. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1925–1934.
    Google ScholarLocate open access versionFindings
  • Long, J.; Shelhamer, E.; and Darrell, T. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3431–3440.
    Google ScholarLocate open access versionFindings
  • Luo, Y.; Zheng, L.; Guan, T.; Yu, J.; and Yang, Y. 2019. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2507–2516.
    Google ScholarLocate open access versionFindings
  • Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Automatic differentiation in pytorch.
    Google ScholarFindings
  • Paul, S.; Tsai, Y.-H.; Schulter, S.; Roy-Chowdhury, A. K.; and Chandraker, M. 20Domain Adaptive Semantic Segmentation Using Weak Labels. In Proceedings of the European conference on computer vision.
    Google ScholarLocate open access versionFindings
  • Richter, S. R.; Vineet, V.; Roth, S.; and Koltun, V. 2016. Playing for data: Ground truth from computer games. In Proceedings of the European conference on computer vision, 102–118. Springer.
    Google ScholarLocate open access versionFindings
  • Ronneberger, O.; Fischer, P.; and Brox, T. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, 234–241. Springer.
    Google ScholarLocate open access versionFindings
  • Ros, G.; Sellart, L.; Materzynska, J.; Vazquez, D.; and Lopez, A. M. 2016. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3234–3243.
    Google ScholarLocate open access versionFindings
  • Tsai, Y.-H.; Hung, W.-C.; Schulter, S.; Sohn, K.; Yang, M.H.; and Chandraker, M. 2018. Learning to adapt structured output space for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7472–7481.
    Google ScholarLocate open access versionFindings
  • Tsai, Y.-H.; Sohn, K.; Schulter, S.; and Chandraker, M. 2019. Domain adaptation for structured output via discriminative patch representations. In Proceedings of the IEEE International Conference on Computer Vision, 1456–1465.
    Google ScholarLocate open access versionFindings
  • Vu, T.-H.; Jain, H.; Bucher, M.; Cord, M.; and Perez, P. 2019. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2517–2526.
    Google ScholarLocate open access versionFindings
  • Wu, Z.; Han, X.; Lin, Y.-L.; Gokhan Uzunbas, M.; Goldstein, T.; Nam Lim, S.; and Davis, L. S. 2018. Dcan: Dual channel-wise alignment networks for unsupervised scene adaptation. In Proceedings of the European Conference on Computer Vision, 518–534.
    Google ScholarLocate open access versionFindings
  • Yang, J.; Xu, R.; Li, R.; Qi, X.; Shen, X.; Li, G.; and Lin, L. 2020. An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, 12613– 12620.
    Google ScholarLocate open access versionFindings
  • Zhang, Y.; Qiu, Z.; Yao, T.; Liu, D.; and Mei, T. 2018. Fully convolutional adaptation networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6810–6818.
    Google ScholarLocate open access versionFindings
  • Zhao, H.; Shi, J.; Qi, X.; Wang, X.; and Jia, J. 2017. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2881– 2890.
    Google ScholarLocate open access versionFindings
  • Zhu, J.-Y.; Park, T.; Isola, P.; and Efros, A. A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, 2223–2232.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments