AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
Sentence error rate: The sentence error rate is computed as the number of times that the generated sentence corresponds exactly to one of the reference translations used for the maximum entropy training

Discriminative training and maximum entropy models for statistical machine translation

ACL, pp.295-302, (2002)

被引用1398|浏览248
EI
下载 PDF 全文
引用
微博一下

摘要

We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source-channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language sentence, the target language sentence and possible hi...更多

代码

数据

简介
  • The use of an ‘inverted’ translation model in the unconventional decision rule of Eq 6 results if the authors use the feature function log P r instead of log P r(f1J |eI1).
  • The authors even can use both features log P r and log P r(f1J |eI1), obtaining a more symmetric translation model.
  • Generalizing this approach to direct translation models, the authors extend the feature functions to include the dependence on the additional hidden variable.
重点内容
  • We are given a source (‘French’) sentence f1J = f1, . . . , fj, . . . , fJ , which is to be translated into a target (‘English’) sentence eI1 = e1, . . . , ei, . . . , eI
  • Eq 2 is favored over the direct translation model of Eq 1 with the argument that it yields a modular approach
  • The search space consists of the set of all possible target language sentences eI1 and all possible alignments aJ1. Generalizing this approach to direct translation models, we extend the feature functions to include the dependence on the additional hidden variable
  • sentence error rate: The sentence error rate is computed as the number of times that the generated sentence corresponds exactly to one of the reference translations used for the maximum entropy training
  • We introduce as additional measure the position-independent word error rate (PER)
  • The use of direct maximum entropy translation mod- We have presented a framework for statistical MT els for statistical machine translation has been sug- for natural languages, which is more general than the widely used source-channel approach
结果
  • The authors use the logarithm of the components of a translation model as feature functions.
  • The authors could use additional language models by using features of the following form: h(f1J , eI1) = h
  • The authors could use grammatical features that relate certain grammatical dependencies of source and target language.
  • Using a function k(·) that counts how many verb groups exist in the source or the target sentence, the authors can define the following feature, which is 1 if each of the two sentences contains the same number of verb groups: 4 Training
  • To train the model parameters λM 1 of the direct translation model according to Eq 11, the authors use the GIS (Generalized Iterative Scaling) algorithm (Darroch and Ratcliff, 1972).
  • The authors might have the problem that no single of the reference translations is part of the nbest list because the search algorithm performs pruning, which in principle limits the possible translations that can be produced given a certain input sentence.
  • The authors define for maximum entropy training each sentence as reference translation that has the minimal number of word errors with respect to any of the reference translations.
  • SER: The SER is computed as the number of times that the generated sentence corresponds exactly to one of the reference translations used for the maximum entropy training.
  • The authors use a normal word trigram language model and the three component models of the alignment templates.
结论
  • The authors observe improved error rates for using the word penalty and the class-based language model as additional features.
  • The use of direct maximum entropy translation mod- The authors have presented a framework for statistical MT els for statistical machine translation has been sug- for natural languages, which is more general than the widely used source-channel approach.
  • The authors can interpret it as an approximation to the Bayes decision rule in Eq 2 or as an instance of a direct maximum entropy model with feature functions log P r and log P r(f1J |eI1).
总结
  • The use of an ‘inverted’ translation model in the unconventional decision rule of Eq 6 results if the authors use the feature function log P r instead of log P r(f1J |eI1).
  • The authors even can use both features log P r and log P r(f1J |eI1), obtaining a more symmetric translation model.
  • Generalizing this approach to direct translation models, the authors extend the feature functions to include the dependence on the additional hidden variable.
  • The authors use the logarithm of the components of a translation model as feature functions.
  • The authors could use additional language models by using features of the following form: h(f1J , eI1) = h
  • The authors could use grammatical features that relate certain grammatical dependencies of source and target language.
  • Using a function k(·) that counts how many verb groups exist in the source or the target sentence, the authors can define the following feature, which is 1 if each of the two sentences contains the same number of verb groups: 4 Training
  • To train the model parameters λM 1 of the direct translation model according to Eq 11, the authors use the GIS (Generalized Iterative Scaling) algorithm (Darroch and Ratcliff, 1972).
  • The authors might have the problem that no single of the reference translations is part of the nbest list because the search algorithm performs pruning, which in principle limits the possible translations that can be produced given a certain input sentence.
  • The authors define for maximum entropy training each sentence as reference translation that has the minimal number of word errors with respect to any of the reference translations.
  • SER: The SER is computed as the number of times that the generated sentence corresponds exactly to one of the reference translations used for the maximum entropy training.
  • The authors use a normal word trigram language model and the three component models of the alignment templates.
  • The authors observe improved error rates for using the word penalty and the class-based language model as additional features.
  • The use of direct maximum entropy translation mod- The authors have presented a framework for statistical MT els for statistical machine translation has been sug- for natural languages, which is more general than the widely used source-channel approach.
  • The authors can interpret it as an approximation to the Bayes decision rule in Eq 2 or as an instance of a direct maximum entropy model with feature functions log P r and log P r(f1J |eI1).
表格
  • Table1: Characteristics of training corpus (Train), manual lexicon (Lex), development corpus (Dev), test corpus (Test)
  • Table2: Effect of maximum entropy training for alignment template approach (WP: word penalty feature,
  • Table3: Resulting model scaling factors of maximum entropy training for alignment templates; λ1
Download tables as Excel
相关工作
  • 7 Conclusions

    The use of direct maximum entropy translation mod- We have presented a framework for statistical MT els for statistical machine translation has been sug- for natural languages, which is more general than the widely used source-channel approach. It allows a baseline MT system to be extended easily by adding new feature functions. We have shown that a baseline statistical MT system can be significantly improved using this framework.

    There are two possible interpretations for a statistical MT system structured according to the sourcechannel approach, hence including a model for P r(eI1) and a model for P r(f1J |eI1). We can interpret it as an approximation to the Bayes decision rule in Eq 2 or as an instance of a direct maximum entropy model with feature functions log P r(eI1) and log P r(f1J |eI1). As soon as we want to use model scaling factors, we can only do this in a theoretically justified way using the second interpretation. Yet, the main advantage comes from the large number of additional possibilities that we obtain by using the second interpretation.
引用论文
  • L. R. Bahl, P. F. Brown, P. V. de Souza, and R. L. Mercer. 1986. Maximum mutual information estimation of hidden markov model parameters. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 49–52, Tokyo, Japan, April.
    Google ScholarLocate open access versionFindings
  • A. L. Berger, S. A. Della Pietra, and V. J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–72, March.
    Google ScholarLocate open access versionFindings
  • P. Beyerlein. 1997. Discriminative model combination. In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, pages 238– 245, Santa Barbara, CA, December.
    Google ScholarLocate open access versionFindings
  • P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311.
    Google ScholarLocate open access versionFindings
  • J. N. Darroch and D. Ratcliff. 1972. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43:1470–1480.
    Google ScholarLocate open access versionFindings
  • B. H. Juang, W. Chou, and C. H. Lee. 1995. Statistical and discriminative methods for speech recognition. In A. J. R. Ayuso and J. M. L. Soler, editors, Speech Recognition and Coding - New Advances and Trends. Springer Verlag, Berlin, Germany.
    Google ScholarLocate open access versionFindings
  • H. Ney. 1995. On the probabilistic-interpretation of neural-network classifiers and discriminative training criteria. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17(2):107–119, February.
    Google ScholarLocate open access versionFindings
  • S. Nießen, F. J. Och, G. Leusch, and H. Ney. 2000. An evaluation tool for machine translation: Fast evaluation for MT research. In Proc. of the Second Int. Conf. on Language Resources and Evaluation (LREC), pages 39–45, Athens, Greece, May.
    Google ScholarLocate open access versionFindings
  • F. J. Och, C. Tillmann, and H. Ney. 199Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20–28, University of Maryland, College Park, MD, June.
    Google ScholarLocate open access versionFindings
  • K. A. Papineni, S. Roukos, and R. T. Ward. 1997. Feature-based language understanding. In European Conf. on Speech Communication and Technology, pages 1435–1438, Rhodes, Greece, September.
    Google ScholarLocate open access versionFindings
  • K. A. Papineni, S. Roukos, and R. T. Ward. 1998. Maximum likelihood and discriminative training of direct translation models. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 189–192, Seattle, WA, May.
    Google ScholarLocate open access versionFindings
  • K. A. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2001. Bleu: a method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY, September.
    Google ScholarFindings
  • J. Peters and D. Klakow. 1999. Compact maximum entropy language models. In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, Keystone, CO, December.
    Google ScholarLocate open access versionFindings
  • R. Schluter and H. Ney. 2001. Model-based MCE bound to the true Bayes’ error. IEEE Signal Processing Letters, 8(5):131–133, May.
    Google ScholarLocate open access versionFindings
  • W. Wahlster. 1993. Verbmobil: Translation of face-toface dialogs. In Proc. of MT Summit IV, pages 127– 135, Kobe, Japan, July.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

最佳论文
2002年, 荣获ACL的最佳论文奖
标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科