Contrastive Adaptation Network for Unsupervised Domain Adaptation

CVPR, Volume abs/1901.00976, 2019, Pages 4893-4902.

Cited by: 76|Bibtex|Views165
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We proposed Contrastive Adaptation Network to perform class-aware alignment for Unsupervised Domain Adaptation

Abstract:

Unsupervised Domain Adaptation (UDA) makes predictions for the target domain data while manual annotations are only available in the source domain. Previous methods minimize the domain discrepancy neglecting the class information, which may lead to misalignment and poor generalization performance. To address this issue, this paper propose...More

Code:

Data:

0
Introduction
  • Recent advancements in deep neural networks have successfully improved a variety of learning problems [40, 8, 26, 19, 20].
  • In the absence of labeled data from the target domain, Unsupervised Domain Adaptation (UDA) methods have emerged to mitigate the domain shift in data distributions [2, 1, 5, 37, 30, 18, 3, 17]
  • It relates to unsupervised learning as it requires manual labels only from the source domain and zero labels from the target domain.
  • Among the recent work on UDA, a seminal line of work proposed by Long et al [22, 25] aims at minimizing the discrepancy between the source and target domain in the deep neural network, where the domain discrepancy is measured by Maximum
Highlights
  • Recent advancements in deep neural networks have successfully improved a variety of learning problems [40, 8, 26, 19, 20]
  • Among the recent work on Unsupervised Domain Adaptation (UDA), a seminal line of work proposed by Long et al [22, 25] aims at minimizing the discrepancy between the source and target domain in the deep neural network, where the domain discrepancy is measured by Maximum
  • Maximum Mean Discrepancy (MMD) and Joint MMD (JMMD) have proven effective in many computer vision problems and demonstrated the state-of-the-art results on several UDA benchmarks [22, 25]
  • Samples of different classes may be aligned incorrectly, e.g. both MMD and JMMD can be minimized even when the target-domain samples are misaligned with the source-domain samples of a different class
  • We propose Contrastive Adaptation Network (CAN) to facilitate the optimization with Contrastive Domain Discrepancy (CDD)
  • We proposed Contrastive Adaptation Network to perform class-aware alignment for UDA
Methods
  • Proposed Method

    Source: Target: Approaching: Splitting: Mean Discrepancy (MMD) [22] and Joint MMD (JMMD) [25].
  • Despite the success of previous methods based on MMD and JMMD, most of them measure the domain discrepancy at the domain level, neglecting the class from which the samples are drawn.
  • These class-agnostic approaches, do not discriminate whether samples from two domains should be aligned according to their class labels (Fig. 1).
  • These solutions may overfit the source data well but are less discriminative for the target
Conclusion
  • The authors proposed Contrastive Adaptation Network to perform class-aware alignment for UDA.
  • The intraclass and inter-class domain discrepancy are explicitly modeled and optimized through end-to-end mini-batch training.
  • Experiments on real-world benchmarks demonstrate the superiority of the model compared with the strong baselines
Summary
  • Introduction:

    Recent advancements in deep neural networks have successfully improved a variety of learning problems [40, 8, 26, 19, 20].
  • In the absence of labeled data from the target domain, Unsupervised Domain Adaptation (UDA) methods have emerged to mitigate the domain shift in data distributions [2, 1, 5, 37, 30, 18, 3, 17]
  • It relates to unsupervised learning as it requires manual labels only from the source domain and zero labels from the target domain.
  • Among the recent work on UDA, a seminal line of work proposed by Long et al [22, 25] aims at minimizing the discrepancy between the source and target domain in the deep neural network, where the domain discrepancy is measured by Maximum
  • Objectives:

    The authors' goal is to evaluate the effectiveness of the proposed technique based on a vanilla backbone (ResNet-101).
  • Methods:

    Proposed Method

    Source: Target: Approaching: Splitting: Mean Discrepancy (MMD) [22] and Joint MMD (JMMD) [25].
  • Despite the success of previous methods based on MMD and JMMD, most of them measure the domain discrepancy at the domain level, neglecting the class from which the samples are drawn.
  • These class-agnostic approaches, do not discriminate whether samples from two domains should be aligned according to their class labels (Fig. 1).
  • These solutions may overfit the source data well but are less discriminative for the target
  • Conclusion:

    The authors proposed Contrastive Adaptation Network to perform class-aware alignment for UDA.
  • The intraclass and inter-class domain discrepancy are explicitly modeled and optimized through end-to-end mini-batch training.
  • Experiments on real-world benchmarks demonstrate the superiority of the model compared with the strong baselines
Tables
  • Table1: Classification accuracy (%) for all the six tasks of Office-31 dataset based on ResNet-50 [<a class="ref-link" id="c14" href="#r14">14</a>, <a class="ref-link" id="c15" href="#r15">15</a>]. Our methods named “intra only” and “CAN” are trained with intra-class domain discrepancy and contrastive domain discrepancy, respectively
  • Table2: Classification accuracy (%) on the VisDA-2017 validation set based on ResNet-101 [<a class="ref-link" id="c14" href="#r14">14</a>, <a class="ref-link" id="c15" href="#r15">15</a>]. Our methods named “intra only” and “CAN” are trained with intra-class domain discrepancy and contrastive domain discrepancy, respectively
  • Table3: The effect of alternative optimization (AO) and CAS. The mean accuracy over six tasks on Office-31 and the mean accuracy over 12 classes on VisDA-2017 validation set are reported
  • Table4: Comparison with different ways of utilizing pseudo target labels.The “pseudo0” means training with pseudo target labels (achieved by our initial clustering) directly. The “pseudo1” is to alternatively update target labels through clustering and minimize the cross-entropy loss on pseudo labeled target data. In “pseudo1”, the cross-entropy loss on source data is also minimized
Download tables as Excel
Related work
  • Class-agnostic domain alignment. A common practice for UDA is to minimize the discrepancy between domains to obtain domain-invariant features [10, 4, 25, 22, 24, 36, 21]. For example, Tzeng et al [38] proposed a kind of domain confusion loss to encourage the network to learn both semantically meaningful and domain invariant representations. Long et al proposed DAN [22] and JAN [25] to minimize the MMD and Joint MMD distance across domains respectively, over the domain-specific layers. Ganin et al [10] enabled the network to learn domain invariant representations in adversarial way by back-propagating the reverse gradients of the domain classifier. Unlike these domaindiscrepancy minimization methods, our method performs class-aware domain alignment. Discriminative domain-invariant feature learning. Some previous works pay efforts to learn more disciminative features while performing domain alignment [35, 13, 31, 32, 28, 39]. Adversarial Dropout Regularization (ADR) [31] and Maximum Classifier Discrepancy (MCD) [32] were proposed to train a deep neural network in adversarial way to avoid generating non-discriminative features lying in the region near the decision boundary. Similar to us, Long et al [23] and Pei et al [28] take the class information into account while measuring the domain discrepancy. However, our method differs from theirs mainly in two aspects. Firstly, we explicitly model two types of domain discrepancy, i.e. the intra-class domain discrepancy and the interclass domain discrepancy. The inter-class domain discrepancy, which has been ignored by most previous methods, is proved to be beneficial for enhancing the model adaptation performance. Secondly, in the context of deep neural networks, we treat the training process as an alternative optimization over target label hypothesis and features. Intra-class compactness and inter-class separability modeling. This paper is also related to the work that explicitly models the intra-class compactness and the inter-class separability, e.g. the contrastive loss [12] and the triplet loss [33]. These methods have been used in various applications, e.g. face recognition [6], person re-identification [16], etc. Different from these methods designed for a single domain, our work focuses on adaptation across domains.
Funding
  • This work was supported in part by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DOI/IBC) contract number D17PC00340
Reference
  • S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains. Machine learning, 79(1-2):151–175, 2010. 1
    Google ScholarLocate open access versionFindings
  • S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in neural information processing systems, pages 137–144, 2007. 1
    Google ScholarLocate open access versionFindings
  • K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, page 7, 2017. 1
    Google ScholarLocate open access versionFindings
  • K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. Domain separation networks. In Advances in Neural Information Processing Systems, pages 343–351, 2016. 2
    Google ScholarLocate open access versionFindings
  • L. Bruzzone and M. Marconcini. Domain adaptation problems: A dasvm classification technique and a circular validation strategy. IEEE transactions on pattern analysis and machine intelligence, 32(5):770–787, 2010. 1
    Google ScholarLocate open access versionFindings
  • D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng. Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1335–1344, 2012
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. 1, 4, 6
    Google ScholarLocate open access versionFindings
  • X. Dong and Y. Yang. Searching for a robust neural architecture in four gpu hours. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 1
    Google ScholarLocate open access versionFindings
  • G. French, M. Mackiewicz, and M. Fisher. Self-ensembling for domain adaptation. ICLR, 2018. 6, 7
    Google ScholarLocate open access versionFindings
  • Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation. ICML, 2015. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domainadversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016. 6, 7
    Google ScholarLocate open access versionFindings
  • R. Hadsell, S. Chopra, and Y. LeCun. Dimensionality reduction by learning an invariant mapping. In Computer vision and pattern recognition, 2006 IEEE computer society conference on, volume 2, pages 1735–1742. IEEE, 2006. 2
    Google ScholarLocate open access versionFindings
  • P. Haeusser, T. Frerix, A. Mordvintsev, and D. Cremers. Associative domain adaptation. In International Conference on Computer Vision (ICCV), volume 2, page 6, 2017. 2
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 4, 6, 7
    Google ScholarLocate open access versionFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630–645. Springer, 2016. 4, 6, 7
    Google ScholarLocate open access versionFindings
  • A. Hermans, L. Beyer, and B. Leibe. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017. 2
    Findings
  • J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 201
    Findings
  • J. Hoffman, D. Wang, F. Yu, and T. Darrell. Fcns in the wild: Pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649, 2016. 1
    Findings
  • L. Jiang, D. Meng, T. Mitamura, and A. G. Hauptmann. Easy samples first: Self-paced reranking for zero-example multimedia search. In Proceedings of the 22nd ACM international conference on Multimedia, pages 547–556. ACM, 2014. 1
    Google ScholarLocate open access versionFindings
  • G. Kang, J. Li, and D. Tao. Shakeout: A new approach to regularized deep neural network training. IEEE transactions on pattern analysis and machine intelligence, 40(5):1245– 1258, 2018. 1
    Google ScholarLocate open access versionFindings
  • G. Kang, L. Zheng, Y. Yan, and Y. Yang. Deep adversarial attention alignment for unsupervised domain adaptation: the benefit of target expectation maximization. In Proceedings of the European Conference on Computer Vision (ECCV), pages 401–416, 2018. 2
    Google ScholarLocate open access versionFindings
  • M. Long, Y. Cao, J. Wang, and M. I. Jordan. Learning transferable features with deep adaptation networks. arXiv preprint arXiv:1502.02791, 2015. 1, 2, 4, 6, 7
    Findings
  • M. Long, J. Wang, G. Ding, J. Sun, and S. Y. Philip. Transfer feature learning with joint distribution adaptation. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 2200–2207. IEEE, 2013. 2
    Google ScholarLocate open access versionFindings
  • M. Long, H. Zhu, J. Wang, and M. I. Jordan. Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, pages 136–144, 2016. 2
    Google ScholarLocate open access versionFindings
  • M. Long, H. Zhu, J. Wang, and M. I. Jordan. Deep transfer learning with joint adaptation networks. ICML, 2017. 1, 2, 4, 6, 7
    Google ScholarLocate open access versionFindings
  • Y. Luo, L. Zheng, T. Guan, J. Yu, and Y. Yang. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 1
    Google ScholarLocate open access versionFindings
  • L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008. 6
    Google ScholarLocate open access versionFindings
  • Z. Pei, Z. Cao, M. Long, and J. Wang. Multi-adversarial domain adaptation. In AAAI Conference on Artificial Intelligence, 2018. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko. Visda: The visual domain adaptation challenge. arXiv preprint arXiv:1710.06924, 2017. 2, 6
    Findings
  • K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In European conference on computer vision, pages 213–226. Springer, 2010. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • K. Saito, Y. Ushiku, T. Harada, and K. Saenko. Adversarial dropout regularization. ICLR, 2018. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • K. Saito, K. Watanabe, Y. Ushiku, and T. Harada. Maximum classifier discrepancy for unsupervised domain adaptation. CVPR, 2018. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015. 2
    Google ScholarLocate open access versionFindings
  • D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu. Equivalence of distance-based and rkhs-based statistics in hypothesis testing. The Annals of Statistics, pages 2263–2291, 2013. 3
    Google ScholarLocate open access versionFindings
  • O. Sener, H. O. Song, A. Saxena, and S. Savarese. Learning transferrable representations for unsupervised domain adaptation. In Advances in Neural Information Processing Systems, pages 2110–2118, 2016. 2
    Google ScholarLocate open access versionFindings
  • B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European Conference on Computer Vision, pages 443–450. Springer, 2016. 2
    Google ScholarLocate open access versionFindings
  • E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition (CVPR), volume 1, page 4, 2017. 1
    Google ScholarLocate open access versionFindings
  • E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014. 2
    Findings
  • J. Wang, W. Feng, Y. Chen, H. Yu, M. Huang, and P. S. Yu. Visual domain adaptation with manifold embedded distribution alignment. In 2018 ACM Multimedia Conference on Multimedia Conference, pages 402–410. ACM, 2018. 2
    Google ScholarLocate open access versionFindings
  • L. Zhu, Z. Xu, and Y. Yang. Bidirectional multirate reconstruction for temporal modeling in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2653–2662, 2017. 1
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments