Joint bilingual name tagging for parallel corpora.

CIKM'12: 21st ACM International Conference on Information and Knowledge Management Maui Hawaii USA October, 2012(2012)

引用 31|浏览66
暂无评分
摘要
Traditional isolated monolingual name taggers tend to yield inconsistent results across two languages. In this paper, we propose two novel approaches to jointly and consistently extract names from parallel corpora. The first approach uses standard linear-chain Conditional Random Fields (CRFs) as the learning framework, incorporating cross-lingual features propagated between two languages. The second approach is based on a joint CRFs model to jointly decode sentence pairs, incorporating bilingual factors based on word alignment. Experiments on Chinese-English parallel corpora demonstrated that the proposed methods significantly outperformed monolingual name taggers, were robust to automatic alignment noise and achieved state-of-the-art performance. With only 20%of the training data, our proposed methods can already achieve better performance compared to the baseline learned from the whole training set.1
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要