Effective Use of Linguistic and Contextual Information for Statistical Machine Translation.
EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1(2009)
摘要
Current methods of using lexical features in machine translation have difficulty in scaling up to realistic MT tasks due to a prohibitively large number of parameters involved. In this paper, we propose methods of using new linguistic and contextual features that do not suffer from this problem and apply them in a state-of-the-art hierarchical MT system. The features used in this work are non-terminal labels, non-terminal length distribution, source string context and source dependency LM scores. The effectiveness of our techniques is demonstrated by significant improvements over a strong base-line. On Arabic-to-English translation, improvements in lower-cased BLEU are 2.0 on NIST MT06 and 1.7 on MT08 newswire data on decoding output. On Chinese-to-English translation, the improvements are 1.0 on MT06 and 0.8 on MT08 newswire data.
更多查看译文
关键词
MT08 newswire data,Arabic-to-English translation,Chinese-to-English translation,machine translation,NIST MT06,non-terminal label,non-terminal length distribution,realistic MT task,source dependency LM score,source string context,contextual information,effective use,statistical machine translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络