Exploring representation-learning approaches to domain adaptation

DANLP 2010: Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing(2010)

引用 13|浏览21
暂无评分
摘要
Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Sequence labeling systems like part-of-speech taggers are typically trained on newswire text, and in tests their error rate on, for example, biomedical data can triple, or worse. We investigate techniques for building open-domain sequence labeling systems that approach the ideal of a system whose accuracy is high and constant across domains. In particular, we investigate unsupervised techniques for representation learning that provide new features which are stable across domains, in that they are predictive in both the training and out-of-domain test data. In experiments, our novel techniques reduce error by as much as 29% relative to the previous state of the art on out-of-domain text.
更多
查看译文
关键词
biomedical data,newswire text,out-of-domain test data,out-of-domain text,training data,error rate,open-domain sequence,new feature,novel technique,part-of-speech taggers,domain adaptation,representation-learning approach
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要