# Fast decoding and optimal decoding for machine translation

ACL, pp.228-235, (2001)

EI

关键词

摘要

A good decoding algorithm is critical to the success of any statistical machine translation system. The decoder's job is to find the translation that is most likely according to set of previously learned parameters (and a formula for combining them). Since the space of possible translations is extremely large, typical decoding algorithms ...更多

代码：

数据：

简介

- A statistical MT system that translates French sentences into English, is divided into three parts: (1) a language model (LM) that assigns a probability P(e) to any English string, (2) a

¤translation model (TM) that assigns a probability

P(f e) to any pair of English and French strings, and (3) a decoder. - If the head of English word e is placed in French position j, its first non-
- The first has fertility zero, while the second is aligned to a single French j word.

重点内容

- A statistical machine translation (MT) system that translates French sentences into English, is divided into three parts: (1) a language model (LM) that assigns a probability P(e) to any English string, (2) a

¤translation model (TM) that assigns a probability

P(f e) to any pair of English and French strings, and (3) a decoder - This paper reports on measurements of speed, search errors, and translation quality in the context of a traditional stack decoder (Jelinek, 1969; Brown et al, 1995) and two new decoders
- With more than one stack, how does a multistack decoder choose which hypothesis to extend during each iteration? We address this issue by taking one hypothesis from each stack, but a better solution would be to somehow compare hypotheses from different stacks and extend only the best ones
- The greedy decoder that we describe starts the translation process from an English gloss of the French sentence given as input
- We evaluated all decoders with respect to (1) speed, (2) search optimality, and (3) translation accuracy
- The last two factors may not always coincide, as Model 4 is an imperfect model of the translation process—i.e., there is no guarantee that a numerically optimal decoding is a good translation

结果

- AddZfert is by far the most expensive operation, as the authors must consider inserting a zero-fertility English word before each translation of each unaligned French word.
- According to the definition of the decoding problem, a zero-fertility English word can only
- ¤ make a decoding more likely by increasing P(e) more than it decreases P(a,f e).2 By only considering helpful zero-fertility insertions, the authors save themselves significant overhead in the AddZfert operation, in many cases eliminating all possibilities and reducing its cost to less than that of AddNull.
- The greedy decoder that the authors describe starts the translation process from an English gloss of the French sentence given as input.
- The gloss is constructed by aligning each French word f l lnm with its most likely English translation ef (ef o ¤ argmax t(e f )).
- French sentence “Bien entendu , il parle de une belle victoire .”, the greedy decoder initially asp p B q 2We know that adding a zero-fertility word will decrease
- If e is the NULL word, the word e is inserted into the translation at the position that yields the s ltt alignment of highest probability.
- When it starts from the gloss of the French sentence “Bien entendu, il parle de une belle victoire.”, for example, the greedy decoder alters the initial alignment incrementally as shown in Figure 2, eventually producing the translation “Quite naturally, he talks about a great victory.”.
- The authors chose the operation types enumerated above for two reasons: (i) they are general enough to enable the decoder escape local maxima and modify in a non-trivial manner a given alignment in order to produce good translations; (ii) they are relatively inexpensive.
- The authors populate each city with ten hotels corresponding to ten likely English word translations.

结论

- For 6-word French sentences, the authors normally come up with a graph that has about 80 hotels and 3500 finite-cost travel segments.
- Since the majority of the translation errors can be attributed to the language and translation models the authors use, it is clear that significant improvement in translation quality will come from better sent decoder time search translation length type errors errors
- Even when the greedy decoder uses an optimized-forspeed set of operations in which at most one word is translated, moved, or inserted at a time and at

总结

- A statistical MT system that translates French sentences into English, is divided into three parts: (1) a language model (LM) that assigns a probability P(e) to any English string, (2) a

¤translation model (TM) that assigns a probability

P(f e) to any pair of English and French strings, and (3) a decoder. - If the head of English word e is placed in French position j, its first non-
- The first has fertility zero, while the second is aligned to a single French j word.
- AddZfert is by far the most expensive operation, as the authors must consider inserting a zero-fertility English word before each translation of each unaligned French word.
- According to the definition of the decoding problem, a zero-fertility English word can only
- ¤ make a decoding more likely by increasing P(e) more than it decreases P(a,f e).2 By only considering helpful zero-fertility insertions, the authors save themselves significant overhead in the AddZfert operation, in many cases eliminating all possibilities and reducing its cost to less than that of AddNull.
- The greedy decoder that the authors describe starts the translation process from an English gloss of the French sentence given as input.
- The gloss is constructed by aligning each French word f l lnm with its most likely English translation ef (ef o ¤ argmax t(e f )).
- French sentence “Bien entendu , il parle de une belle victoire .”, the greedy decoder initially asp p B q 2We know that adding a zero-fertility word will decrease
- If e is the NULL word, the word e is inserted into the translation at the position that yields the s ltt alignment of highest probability.
- When it starts from the gloss of the French sentence “Bien entendu, il parle de une belle victoire.”, for example, the greedy decoder alters the initial alignment incrementally as shown in Figure 2, eventually producing the translation “Quite naturally, he talks about a great victory.”.
- The authors chose the operation types enumerated above for two reasons: (i) they are general enough to enable the decoder escape local maxima and modify in a non-trivial manner a given alignment in order to produce good translations; (ii) they are relatively inexpensive.
- The authors populate each city with ten hotels corresponding to ten likely English word translations.
- For 6-word French sentences, the authors normally come up with a graph that has about 80 hotels and 3500 finite-cost travel segments.
- Since the majority of the translation errors can be attributed to the language and translation models the authors use, it is clear that significant improvement in translation quality will come from better sent decoder time search translation length type errors errors
- Even when the greedy decoder uses an optimized-forspeed set of operations in which at most one word is translated, moved, or inserted at a time and at

- Table1: Comparison of decoders on sets of 101 test sentences. All experiments in this table use a bigram language model
- Table2: Comparison between decoders using a trigram language model. Greedy and greedy are greedy decoders optimized for speed

基金

- This work was supported by DARPA-ITO grant N66001-00-1-9814

引用论文

- P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2).
- P. Brown, J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, J. Lai, and R. Mercer. 1995. Method and system for natural language translation. U.S. Patent 5,477,451.
- M. Garey and D. Johnson. 1979. Computers and Intractability. A Guide to the Theory of NPCompleteness. W.H. Freeman and Co., New York.
- F. Jelinek. 1969. A fast sequential decoding algorithm using a stack. IBM Research Journal of Research and Development, 13.
- K. Knight. 1999. Decoding complexity in wordreplacement translation models. Computational Linguistics, 25(4).
- R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman, and L. Troyansky. 1999. Determining computational complexity from characteristic ‘phase transitions’. Nature, 800(8).
- B. Selman, H. Levesque, and D. Mitchell. 1992. A new method for solving hard satisfiability problems. In Proc. AAAI.
- C. Tillmann, S. Vogel, H. Ney, and A. Zubiaga. 1997. A DP-based search using monotone alignments in statistical translation. In Proc. ACL.
- Y. Wang and A. Waibel. 1997. Decoding algorithm in statistical machine translation. In Proc. ACL.
- D. Wu. 1996. A polynomial-time algorithm for statistical machine translation. In Proc. ACL.

最佳论文

2001年， 荣获ACL的最佳论文奖

标签

评论

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn