Mining Parenthetical Translations from the Web by Word Alignment

ACL(2008)

引用 67|浏览50
暂无评分
摘要
Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extract such translations from a large collec- tion of web documents by building a partially parallel corpus and use a word alignment al- gorithm to identify the terms being translated. The method is able to generalize across the translations for different terms and can relia- bly extract translations that occurred only once in the entire web. Our experiment on Chinese web pages produced more than 26 million pairs of translations, which is over two orders of magnitude more than previous re- sults. We show that the addition of the ex- tracted translation pairs as training data provides significant increase in the BLEU score for a statistical machine translation sys- tem.
更多
查看译文
关键词
web pages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要