A system to mine large-scale bilingual dictionaries from monolingual web pages.

MTSummit(2007)

引用 59|浏览11
暂无评分
摘要
This paper describes a system that automatically mines English- Chinese translation pairs from large amount of monolingual Chinese web pages. Our approach is motivated by the observation that many Chinese terms (e.g., named entities that are not stored in a conventional dictionary) are accompanied by their English translations in the Chinese web pages. In our approach, candidate translations are extracted using pre-defined templates. Transliterations and translation pairs are then identified using statistical learning methods. We compare several approaches to aligning transliterations and mining translations on more than 300GB Chinese web pages. In our experiments on MSN query log, we show that the mined bilingual dictionary greatly enlarges the coverage of an existing English-Chinese dictionary. It also improves query translation in cross-language information retrieval, leading to significantly higher retrieval effectiveness in on TREC collections.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要