The MSR-MSRA MT System for NIST Open Machine Translation 2008 Evaluation

msra(1996)

引用 23|浏览80
暂无评分
摘要
1 Mei Yang was an intern with MSR in the summer of 2007 The system combination approach combining system outputs at the word level is similar to the one described in (Rosti et al., 2007). Compared to the previous work, we developed a new method to generate a better alignment between multiple MT hypotheses from different individual systems, which is used to construct a high-quality confusion network. The details of our method will be elaborated in a future paper (He et al., 2008). First, a minimum Bayes risk (MBR) based method is used to select a backbone from the multiple hypotheses, then all the hypotheses are aligned to that backbone to form a confusion network, i.e., a word lattice in which each word is aligned to a list of alternative words (including null). Then, a set of features, including language model scores, word count, and normalized system voting score, are used to decode the confusion network. In training, a confusion network is constructed based on the multiple hypotheses of each sentence in a dev set. Then the corresponding feature weights are trained using Powell’s search to maximize the BLEU score on that dev set. In testing, a confusion network for each sentence in the test set is constructed and these feature weights are applied to decode the final MT output from the confusion network. In this entry, two language models are used, including a 3-gram LM trained on the English part of the parallel training data, and a 5gram LM trained on the whole English Gigaword corpus using a scalable LM toolkit (Nguyen et al., 2007).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要