AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We review some existing work in both the machine learning and the natural language processing communities related to domain adaptation

A Literature Survey on Domain Adaptation of Statistical Classifiers

被引用253|浏览5
下载 PDF 全文
引用
微博一下

摘要

Jing Jiang jiang4@cs.uiuc.edu Last modified in March 2008 et al., 2008). However, some special kinds of domain adaptation problems have been studied before under different names including class imbalance (<a class="ref-link" id="cJapkowicz_2002_a" href="#rJapkowicz_2002_a">Japkowicz and Stephen, 2002</a>), covariate shift (<a class="ref-l...更多

代码

数据

简介
  • The authors review some existing work in both the machine learning and the natural language processing communities related to domain adaptation.
  • One general approach to addressing the domain adaptation problem is to assign instance-dependent weights to the loss function when minimizing the expected loss over the distribution of data.
  • It is proposed to transform this density ratio estimation into a problem of predicting whether an instance is from the source domain or from the target domain (Zadrozny, 2004; Bickel and Scheffer, 2007).
重点内容
  • We review some existing work in both the machine learning and the natural language processing communities related to domain adaptation
  • A systematic literature survey naturally reveals the limitations of current work and points out promising directions that should be explored in the future
  • It is proposed to transform this density ratio estimation into a problem of predicting whether an instance is from the source domain or from the target domain (Zadrozny, 2004; Bickel and Scheffer, 2007)
  • Blitzer et al (2006) proposed a structural correspondence learning (SCL) algorithm that makes use of the unlabeled data from the target domain to find a low-rank representation that is suitable for domain adaptation
  • Ensemble methods are a type of learning algorithms that combine a set of models to construct a complex classifier for a classification problem
结果
  • Xing et al (2007) proposed a bridged refinement method for domain adaptation using label propagation on a nearest neighbor graph, which has resemblance to graph-based semi-supervised learning algorithms (Zhu, 2005; Chapelle et al, 2006).
  • If the authors can find a transformation function g so that under this transformation, the authors have Pt(Z, Y ) = Ps(Z, Y ), the authors no longer have the domain adaptation problem because the two domains have the same joint distribution of the observation and the class label.
  • Blitzer et al (2006) proposed a structural correspondence learning (SCL) algorithm that makes use of the unlabeled data from the target domain to find a low-rank representation that is suitable for domain adaptation.
  • SCL tries to find a representation that works well for many related classification tasks for which labels are available in both the source and the target domains.
  • The section, the authors review two kinds of methods that work for supervised domain adaptation, i.e. when a small amount of labeled data from the target domain is available.
  • The original definition of multi-task learning considers a different setting than domain adaptation.
  • Domain adaptation can be treated as a special case of multi-task learning, where the authors have two tasks, one on the source domain and the other on the target domain, and the class label sets of these two tasks are the same.
  • If the authors have some labeled data from the target domain, the authors can directly apply some existing multi-task learning algorithm.
结论
  • Some domain adaptation methods proposed recently are essentially multi-task learning algorithms.
  • Jiang and Zhai (2007b) proposed a two-stage domain adaptation method, where in the first generalization stage, labeled instances from K different source training domains are used together to train K different models, but these models share a common component, and this common model component only applies to a subset of features that are considered generalizable across domains.
  • Labeled data from both the source and the target domains is needed to learn this three-component mixture model using the conditional expectation maximization (CEM) algorithm.
总结
  • The authors review some existing work in both the machine learning and the natural language processing communities related to domain adaptation.
  • One general approach to addressing the domain adaptation problem is to assign instance-dependent weights to the loss function when minimizing the expected loss over the distribution of data.
  • It is proposed to transform this density ratio estimation into a problem of predicting whether an instance is from the source domain or from the target domain (Zadrozny, 2004; Bickel and Scheffer, 2007).
  • Xing et al (2007) proposed a bridged refinement method for domain adaptation using label propagation on a nearest neighbor graph, which has resemblance to graph-based semi-supervised learning algorithms (Zhu, 2005; Chapelle et al, 2006).
  • If the authors can find a transformation function g so that under this transformation, the authors have Pt(Z, Y ) = Ps(Z, Y ), the authors no longer have the domain adaptation problem because the two domains have the same joint distribution of the observation and the class label.
  • Blitzer et al (2006) proposed a structural correspondence learning (SCL) algorithm that makes use of the unlabeled data from the target domain to find a low-rank representation that is suitable for domain adaptation.
  • SCL tries to find a representation that works well for many related classification tasks for which labels are available in both the source and the target domains.
  • The section, the authors review two kinds of methods that work for supervised domain adaptation, i.e. when a small amount of labeled data from the target domain is available.
  • The original definition of multi-task learning considers a different setting than domain adaptation.
  • Domain adaptation can be treated as a special case of multi-task learning, where the authors have two tasks, one on the source domain and the other on the target domain, and the class label sets of these two tasks are the same.
  • If the authors have some labeled data from the target domain, the authors can directly apply some existing multi-task learning algorithm.
  • Some domain adaptation methods proposed recently are essentially multi-task learning algorithms.
  • Jiang and Zhai (2007b) proposed a two-stage domain adaptation method, where in the first generalization stage, labeled instances from K different source training domains are used together to train K different models, but these models share a common component, and this common model component only applies to a subset of features that are considered generalizable across domains.
  • Labeled data from both the source and the target domains is needed to learn this three-component mixture model using the conditional expectation maximization (CEM) algorithm.
引用论文
  • Rie Ando and Tong Zhang. A framework for learning predictive structure from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853, November 2005.
    Google ScholarLocate open access versionFindings
  • Shai Ben-David and Reba Schuller. Exploiting task relatedness for multiple task learning. In Proceedings of the 16th Annual Conference on Learning Theory, Washington D.C., USA, August 2003.
    Google ScholarLocate open access versionFindings
  • Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 137–144. MIT Press, Cambridge, Massachusetts, USA, 2007.
    Google ScholarLocate open access versionFindings
  • Steffen Bickel and Tobias Scheffer. Dirichlet-enhanced spam filtering based on biased samples. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 161–168. MIT Press, Cambridge, Massachusetts, USA, 2007.
    Google ScholarLocate open access versionFindings
  • Steffen Bickel, Michael Bruckner, and Tobias Scheffer. Discriminative learning for differing training and test distributions. In Proceedings of the 24th Annual International Conference on Machine Learning, pages 81–88, Corvallis, Oregon, USA, June 2007.
    Google ScholarLocate open access versionFindings
  • John Blitzer, Ryan McDonald, and Fernando Pereira. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 120–128, Sydney, Australia, July 2006.
    Google ScholarLocate open access versionFindings
  • John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman. Learning bounds for domain adaptation. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, Massachusetts, USA, 2008.
    Google ScholarLocate open access versionFindings
  • Rich Caruana. Multitask learning. Machine Learning, 28(1):41–75, July 1997.
    Google ScholarLocate open access versionFindings
  • Yee Seng Chan and Hwee Tou Ng. Estimating class priors in domain adaptation for word sense disambiguation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 89–96, Sydney, Australia, July 2006.
    Google ScholarLocate open access versionFindings
  • Yee Seng Chan and Hwee Tou Ng. Word sense disambiguation with distribution estimation. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, pages 1010–1015, Edingurgh, Scotland, July 2005.
    Google ScholarLocate open access versionFindings
  • Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien, editors. Semi-Supervised Learning. MIT Press, 2006.
    Google ScholarFindings
  • Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, June 2002.
    Google ScholarLocate open access versionFindings
  • Ciprian Chelba and Alex Acero. Adaptation of maximum entropy capitalizer: Little data can help a lot. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 285–292, Barcelona, Spain, July 2004.
    Google ScholarLocate open access versionFindings
  • Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. Transferring naive bayes classifiers for text classification. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence, pages 540–545, Vancouver, British Columbia, Canada, July 2007a.
    Google ScholarLocate open access versionFindings
  • Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. Boosting for transfer learning. In Proceedings of the 24th Annual International Conference on Machine Learning, pages 193–200, Corvallis, Oregon, USA, June 2007b.
    Google ScholarLocate open access versionFindings
  • Hal Daume III. Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 256–263, Prague, Czech Republic, June 2007.
    Google ScholarLocate open access versionFindings
  • Hal Daume III and Daniel Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101–126, May 2006.
    Google ScholarLocate open access versionFindings
  • Theodoros Evgeniou and Massimiliano Pontil. Regularized multi-task learning. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 109–117, Seattle, Washington, USA, August 2004.
    Google ScholarLocate open access versionFindings
  • James J. Heckman. Sample selection bias as a specification error. Econometrica, 47(1):153–161, January 1979.
    Google ScholarLocate open access versionFindings
  • Jiayuan Huang, Alexander J. Smola, Arthur Gretton, Karsten M. Borgwardt, and Bernhard Scholkopf. Correcting sample selection bias by unlabeled data. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 601–608. MIT Press, Cambridge, MA, 2007.
    Google ScholarLocate open access versionFindings
  • Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5):429–450, November 2002.
    Google ScholarLocate open access versionFindings
  • Jing Jiang and ChengXiang Zhai. Instance weighting for domain adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 254–271, Prague, Czech Republic, June 2007a.
    Google ScholarLocate open access versionFindings
  • Jing Jiang and ChengXiang Zhai. A two-stage approach to domain adaptation for statistical classifiers. In Proceedings of the ACM 16th Conference on Information and Knowledge Management, pages 401–410, 2007b.
    Google ScholarLocate open access versionFindings
  • Miroslav Kubat and Stan Matwin. Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the 14th Annual International Conference on Machine Learning, pages 179–186, Nashville, Tennessee, USA, July 1997.
    Google ScholarLocate open access versionFindings
  • Xiao Li and Jeff Bilmes. A Bayesian divergence prior for classifier adaptation. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics, San Juan, Puerto Rico, March 2007.
    Google ScholarLocate open access versionFindings
  • Yi Lin, Yoonkyung Lee, and Grace Wahba. Support vector machines for classification in nonstandard situations. Machine Learning, 46(1–3):191–202, January 2002.
    Google ScholarLocate open access versionFindings
  • Charles A. Micchelli and Massimiliano Pontil. Kernels for multi-task learning. In Lawrence K. Saul, Yair Weiss, and Leon Bottou, editors, Advances in Neural Information Processing Systems 17, pages 921–928. MIT Press, Cambridge, Massachusetts, USA, 2005.
    Google ScholarLocate open access versionFindings
  • Kamal Nigam, Andrew K. McCallum, Sebastian Thrun, and Tom Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2–3):103–134, May 2000.
    Google ScholarLocate open access versionFindings
  • Sandeepkumar Satpal and Sunita Sarawagi. Domain adaptation of conditional probability models via feature subsetting. In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 224–235, Warsaw, Poland, September 2007.
    Google ScholarLocate open access versionFindings
  • Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227–244, October 2000.
    Google ScholarLocate open access versionFindings
  • Amos J. Storkey and Masashi Sugiyama. Mixture regression for covariate shift. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 1337–1344. MIT Press, Cambridge, Massachusetts, USA, 2007.
    Google ScholarLocate open access versionFindings
  • Masashi Sugiyama and Klaus-Robert Muller. Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions, 23(4):249–279, 2005.
    Google ScholarLocate open access versionFindings
  • Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer, second edition, 1999.
    Google ScholarFindings
  • Dikan Xing, Wenyuan Dai, Gui-Rong Xue, and Yong Yu. Bridged refinement for transfer learning. In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 324–335, Warsaw, Poland, September 2007.
    Google ScholarLocate open access versionFindings
  • Ya Xue, Xuejun Liao, Lawrence Carin, and Balaji Krishnapuram. Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 8:35–63, May 2007.
    Google ScholarLocate open access versionFindings
  • Bianca Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proceedings of the 21th Annual International Conference on Machine Learning, pages 114–121, Banff, Canada, July 2004.
    Google ScholarLocate open access versionFindings
  • Jingbo Zhu and Eduard Hovy. Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 783–790, Prague, Czech Republic, June 2007.
    Google ScholarLocate open access versionFindings
  • Xiaojin Zhu. Semi-supervised learning literature survey. Technical Report 1530, University of WisconsinMadison, 2005.
    Google ScholarFindings
作者
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科