Unsupervised Domain Adaptation With Distribution Matching Machines

AAAI, 2018.

Cited by: 39|Bibtex|Views47
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
We proposed a new Distribution Matching Machine for domain adaptation

Abstract:

Domain adaptation generalizes a learning model across source domain and target domain that follow different distributions. Most existing work follows a two-step procedure: first, explores either feature matching or instance reweighting independently, and second, train the transfer classifier separately. In this paper, we show that either ...More

Code:

Data:

Introduction
  • Standard supervised learning machines will encounter poor generalization performance with limited training data, while manual labeling of sufficient training data for emerging application domains is prohibitive.
  • When the crossdomain discrepancy is substantially large, there will always be some source instances that are irrelevant to the target domain even using domain-invariant features, which may introduce large bias to the transfer-classifier (Long et al 2014; Aljundi et al 2015)
  • Another principled strategy is to estimate the weights of the source instances such that the distribution discrepancy can be minimized for the empirical risk minimization learning (Huang et al 2006; Bruzzone and Marconcini 2010; Chen, Weinberger, and Blitzer 2011; Yu and Szepesvari 2012; Chu, De la Torre, and Cohn 2013).
  • This will result in a domain-unbiased but high-variance transfer-classifier, which is not robust to large cross-domain discrepancy
Highlights
  • Standard supervised learning machines will encounter poor generalization performance with limited training data, while manual labeling of sufficient training data for emerging application domains is prohibitive
  • We propose a new Distribution Matching Machine (DMM) based on the structural risk minimization principle (Vapnik 1998), which learns a transfer support vector machine by extracting invariant feature representations and estimating unbiased instance weights that jointly minimize the crossdomain distribution discrepancy
  • (2) Feature-based adaptation methods the dimension-reduced kernel PCA space and DMMA perform much better than instance-based adaptation method KMM
  • Feature matching is not good enough for domain adaptation when the domain difference is substantially large, as there may be some source instances that are irrelevant to the target instances even using invariant features
  • Transfer Joint Matching and Distribution Matching Machine address this limitation by reweighting the source instances according to their relevance to the target instances in the new invariant feature space
  • We proposed a new Distribution Matching Machine (DMM) for domain adaptation
Methods
  • The authors adopt the same evaluation protocol for all comparison methods (Long et al 2014; Aljundi et al 2015).
  • The authors select their optimal hyper-parameters by cross-validation on labeled source data as (Pan et al 2011).
  • The authors give parameter sensitivity analysis for DMM, which will validate that DMM can achieve stable performance for a wide range of hyper-parameter settings
Results
  • Image Classification The classification accuracy of all comparison methods on the 12 transfer tasks of Office-10 + Caltech-10 using DeCAF6 and DeCAF7 features are shown in Tables 2 and 3, respectively.
  • DMM cannot perform the best on all tasks, it is desirable that (1) if DMM performs the best, it usually outperforms the best baseline by a large margin; (2) otherwise, it performs only slightly worse than the best baseline
  • This verifies that DMM is more robust to both feature shift and instance bias for domain adaptation.
Conclusion
  • Feature Visualization: The authors visualize in Figures 2(a)–2(b), Figures 2(e)–2(f), and Figures 2(c)–2(d), 2(g)–2(h) the tSNE embeddings (Donahue et al 2014) of images on transfer tasks A → W and Q → O with features of KMM, TCA, and DMM, respectively.
  • (2) TCA does not learn the unbiased weights of source instances, the source instances that are dissimilar to the target instances will not be down-weighed, leading to large domain bias.
  • These observations explain the inferior performance of KMM and TCA, and highlight the superiority of DMM.
  • Extensive experiments show that DMM significantly outperforms state of the art adaptation methods
Summary
  • Introduction:

    Standard supervised learning machines will encounter poor generalization performance with limited training data, while manual labeling of sufficient training data for emerging application domains is prohibitive.
  • When the crossdomain discrepancy is substantially large, there will always be some source instances that are irrelevant to the target domain even using domain-invariant features, which may introduce large bias to the transfer-classifier (Long et al 2014; Aljundi et al 2015)
  • Another principled strategy is to estimate the weights of the source instances such that the distribution discrepancy can be minimized for the empirical risk minimization learning (Huang et al 2006; Bruzzone and Marconcini 2010; Chen, Weinberger, and Blitzer 2011; Yu and Szepesvari 2012; Chu, De la Torre, and Cohn 2013).
  • This will result in a domain-unbiased but high-variance transfer-classifier, which is not robust to large cross-domain discrepancy
  • Methods:

    The authors adopt the same evaluation protocol for all comparison methods (Long et al 2014; Aljundi et al 2015).
  • The authors select their optimal hyper-parameters by cross-validation on labeled source data as (Pan et al 2011).
  • The authors give parameter sensitivity analysis for DMM, which will validate that DMM can achieve stable performance for a wide range of hyper-parameter settings
  • Results:

    Image Classification The classification accuracy of all comparison methods on the 12 transfer tasks of Office-10 + Caltech-10 using DeCAF6 and DeCAF7 features are shown in Tables 2 and 3, respectively.
  • DMM cannot perform the best on all tasks, it is desirable that (1) if DMM performs the best, it usually outperforms the best baseline by a large margin; (2) otherwise, it performs only slightly worse than the best baseline
  • This verifies that DMM is more robust to both feature shift and instance bias for domain adaptation.
  • Conclusion:

    Feature Visualization: The authors visualize in Figures 2(a)–2(b), Figures 2(e)–2(f), and Figures 2(c)–2(d), 2(g)–2(h) the tSNE embeddings (Donahue et al 2014) of images on transfer tasks A → W and Q → O with features of KMM, TCA, and DMM, respectively.
  • (2) TCA does not learn the unbiased weights of source instances, the source instances that are dissimilar to the target instances will not be down-weighed, leading to large domain bias.
  • These observations explain the inferior performance of KMM and TCA, and highlight the superiority of DMM.
  • Extensive experiments show that DMM significantly outperforms state of the art adaptation methods
Tables
  • Table1: Notations and their descriptions used in this paper
  • Table2: Accuracy (%) on 12 transfer tasks of Office10-Caltech10 with DeCAF6 features. Dataset SVM KMM TCA TJM LSSA DMMA DMM2 DMM C→A 91.6 91.5 90.3 91.4 91.6 92.2 92.7 92.4 C→W 80.7 81.0 83.7 84.7 85.2 83.7 85.4 87.5 C→D 86.0 85.4 88.5 91.7 90.4 88.5 91.1 90.4 A→C 82.2 82.2 82.0 82.1 82.5 85.1 84.8 84.8 A→W 71.9 72.2 75.6 76.3 79.4 79.7 84.1 84.7 A→D 80.9 82.2 85.4 81.5 86.6 89.8 89.8 92.4 W→C 67.9 67.3 68.5 70.2 73.2 78.4 78.4 81.7 W→A 73.4 74.4 74.5 79.5 81.4 85.7 89.7 86.5 W→D 100.0 100.0 99.4 99.4 99.4 98.7 99.4 98.7 D→C 72.8 72.0 76.9 77.1 80.4 80.9 82.3 83.3 D→A 78.7 79.6 82.7 83.9 83.4 88.3 90.3 90.7 D→W 98.3 98.3 98.0 98.3 98.0 98.0 98.0 99.3
  • Table3: Accuracy (%) on 12 transfer tasks of Office10-Caltech10 with DeCAF7 features. Dataset SVM KMM TCA TJM LSSA DMMA DMM2 DMM C→A 92.0 91.0 90.9 91.5 91.9 92.7 91.9 92.6 C→W 84.4 81.0 85.8 88.8 88.8 88.8 88.8 90.5 C→D 86.6 83.4 87.3 91.1 90.4 90.4 91.1 91.7 A→C 82.4 82.5 80.3 80.8 82.4 84.3 84.0 83.3 A→W 84.1 84.4 82.4 83.4 86.4 91.5 86.4 92.2 A→D 86.6 86.6 81.5 88.5 86.4 91.1 91.7 93.0 W→C 73.0 73.2 78.6 78.7 81.6 85.5 85.1 85.8 W→A 79.4 81.4 85.6 87.2 88.4 90.8 90.7 92.5 W→D 99.4 99.4 98.7 100.0 99.4 89.8 98.7 100.0 D→C 76.0 78.3 80.5 80.4 82.5 85.0 84.1 84.3 D→A 83.1 86.7 87.2 86.8 86.7 91.4 92.2 93.2 D→W 96.9 98.3 97.6 97.3 97.3 91.9 99.0 99.7
  • Table4: Classification accuracy (%) on 6 transfer tasks of the Reuters-21578 dataset
Download tables as Excel
Related work
Funding
  • This work was supported by the National Key Research and Development Program of China (2016YFB1000701), National Natural Science Foundation of China (61502265, 61325008, 61772299, 61672313) and TNList Fund
Reference
  • Aljundi, R.; Emonet, R.; Muselet, D.; and Sebban, M. 2015. Landmarks-based kernelized subspace alignment for unsupervised domain adaptation. In CVPR.
    Google ScholarFindings
  • Argyriou, A., and Evgeniou, T. 2006. Multi-task feature learning. In NIPS.
    Google ScholarFindings
  • Ben-David, S.; Blitzer, J.; Crammer, K.; and Pereira, F. 2007. Analysis of representations for domain adaptation. In NIPS.
    Google ScholarFindings
  • Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; and Vaughan, J. W. 2010. A theory of learning from different domains. MLJ 79(1-2):151–175.
    Google ScholarLocate open access versionFindings
  • Bruzzone, L., and Marconcini, M. 2010. Domain adaptation problems: A dasvm classification technique and a circular validation strategy. TPAMI 32(5):770–787.
    Google ScholarLocate open access versionFindings
  • Chen, M.; Weinberger, K. Q.; and Blitzer, J. C. 2011. Cotraining for domain adaptation. In NIPS.
    Google ScholarFindings
  • Chu, W.-S.; De la Torre, F.; and Cohn, J. F. 2013. Selective transfer machine for personalized facial action unit detection. In CVPR. IEEE.
    Google ScholarFindings
  • Donahue, J.; Jia, Y.; Vinyals, O.; Hoffman, J.; Zhang, N.; Tzeng, E.; and Darrell, T. 2014. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML. ACM.
    Google ScholarLocate open access versionFindings
  • Duan, L.; Tsang, I. W.; and Xu, D. 2012. Domain transfer multiple kernel learning. TPAMI 34(3):465–479.
    Google ScholarLocate open access versionFindings
  • Duan, L.; Xu, D.; and Tsang, I. W. 2012. Domain adaptation from multiple sources: A domain-dependent regularization approach. TNNLS 23(3):504–518.
    Google ScholarLocate open access versionFindings
  • Fernando, B.; Habrard, A.; Sebban, M.; and Tuytelaars, T. 2013. Unsupervised visual domain adaptation using subspace alignment. In ICCV.
    Google ScholarFindings
  • Ganin, Y., and Lempitsky, V. 2015. Unsupervised domain adaptation by backpropagation. In ICML.
    Google ScholarFindings
  • Glorot, X.; Bordes, A.; and Bengio, Y. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In ICML.
    Google ScholarFindings
  • Gretton, A.; Borgwardt, K.; Rasch, M.; Scholkopf, B.; and Smola, A. 2012. A kernel two-sample test. JMLR 13:723– 773.
    Google ScholarLocate open access versionFindings
  • Griffin, G.; Holub, A.; and Perona, P. 2007. Caltech-256 object category dataset. Technical report, California Institute of Technology.
    Google ScholarFindings
  • Huang, J.; Smola, A. J.; Gretton, A.; Borgwardt, K. M.; and Scholkopf, B. 2006. Correcting sample selection bias by unlabeled data. In NIPS.
    Google ScholarFindings
  • Jhuo, I.-H.; Liu, D.; Lee, D.-T.; and Chang, S.-F. 2012. Robust visual domain adaptation with low-rank reconstruction. In CVPR.
    Google ScholarFindings
  • Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In NIPS.
    Google ScholarFindings
  • Long, M.; Wang, J.; Ding, G.; Sun, J.; and Yu, P. S. 2013. Transfer feature learning with joint distribution adaptation. In ICCV. IEEE.
    Google ScholarFindings
  • Long, M.; Wang, J.; Ding, G.; Sun, J.; and Yu, P. S. 2014. Transfer joint matching for unsupervised domain adaptation. In CVPR. IEEE.
    Google ScholarFindings
  • Long, M.; Cao, Y.; Wang, J.; and Jordan, M. I. 2015a. Learning transferable features with deep adaptation networks. In ICML. ACM.
    Google ScholarFindings
  • Long, M.; Wang, J.; Sun, J.; and Yu, P. S. 2015b. Domain invariant transfer kernel learning. TKDE 27(6).
    Google ScholarLocate open access versionFindings
  • Long, M.; Zhu, H.; Wang, J.; and Jordan, M. I. 2016. Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, 136–144.
    Google ScholarLocate open access versionFindings
  • Long, M.; Zhu, H.; Wang, J.; and Jordan, M. I. 2017. Deep transfer learning with joint adaptation networks. In ICML.
    Google ScholarFindings
  • Mansour, Y.; Mohri, M.; and Rostamizadeh, A. 2009. Domain adaptation: Learning bounds and algorithms. In COLT.
    Google ScholarFindings
  • Masaeli, M.; Fung, G.; and Dy, J. G. 2010. From transformation-based dimensionality reduction to feature selection. In ICML.
    Google ScholarFindings
  • Pan, S. J., and Yang, Q. 2010. A survey on transfer learning. TKDE 22:1345–1359.
    Google ScholarLocate open access versionFindings
  • Pan, S. J.; Tsang, I. W.; Kwok, J. T.; and Yang, Q. 2011. Domain adaptation via transfer component analysis. TNNLS 22(2):199–210.
    Google ScholarLocate open access versionFindings
  • Pan, S. J.; Kwok, J. T.; and Yang, Q. 2008. Transfer learning via dimensionality reduction. In AAAI.
    Google ScholarFindings
  • Qiu, Q.; Patel, V. M.; Turaga, P.; and Chellappa, R. 2012. Domain adaptive dictionary learning. In ECCV.
    Google ScholarLocate open access versionFindings
  • Quionero-Candela, J.; Sugiyama, M.; Schwaighofer, A.; and Lawrence, N. D. 2009. Dataset Shift in Machine Learning. The MIT Press.
    Google ScholarFindings
  • Saenko, K.; Kulis, B.; Fritz, M.; and Darrell, T. 2010. Adapting visual category models to new domains. In ECCV.
    Google ScholarFindings
  • Sun, B.; Feng, J.; and Saenko, K. 2016. Return of frustratingly easy domain adaptation. In AAAI.
    Google ScholarFindings
  • Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; and Darrell, T. 2015. Simultaneous deep transfer across domains and tasks. In ICCV.
    Google ScholarFindings
  • Tzeng, E.; Hoffman, J.; Saenko, K.; and Darrell, T. 2017. Adversarial discriminative domain adaptation. In CVPR.
    Google ScholarFindings
  • Vapnik, V. 1998. Statistical Learning Theory. John Wiley.
    Google ScholarFindings
  • Yu, Y., and Szepesvari, C. 2012. Analysis of kernel mean matching under covariate shift. In ICML. ACM.
    Google ScholarLocate open access versionFindings
  • Zhang, K.; Scholkopf, B.; Muandet, K.; and Wang, Z. 2013. Domain adaptation under target and conditional shift. In ICML. ACM.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments