AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
English-German translations src Orlando Bloom and Miranda Kerr still love each other ref Orlando Bloom und Miranda Kerr lieben sich noch immer best Orlando Bloom und Miranda Kerr lieben einander noch immer. base Orlando Bloom und Lucas Miranda lieben einander noch immer. src ′′ W...

Effective Approaches to Attention-based Neural Machine Translation

Conference on Empirical Methods in Natural Language Processing, (2015)

Cited by: 5662|Views524
EI
Full Text
Bibtex
Weibo

Abstract

An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a...More

Code:

Data:

Introduction
  • Neural Machine Translation (NMT) achieved state-of-the-art performances in large-scale translation tasks such as from English to French (Luong et al, 2015) and English to German (Jean et al, 2015).
  • A B C D X Y Z is often a large neural network that is trained in an end-to-end fashion and has the ability to generalize well to very long word sequences
  • This means the model does not have to explicitly store gigantic phrase tables and language models as in the case of standard MT; NMT has a small memory footprint.
  • Implementing NMT decoders is easy unlike the highly intricate decoders in standard MT (Koehn et al, 2003)
Highlights
  • Neural Machine Translation (NMT) achieved state-of-the-art performances in large-scale translation tasks such as from English to French (Luong et al, 2015) and English to German (Jean et al, 2015)
  • Following (Luong et al, 2015), we report translation quality using two types of BLEU: (a) tokenized12 BLEU to be comparable with existing Neural Machine Translation work and (b) NIST13 BLEU to be comparable with WMT results
  • English-German translations src Orlando Bloom and Miranda Kerr still love each other ref Orlando Bloom und Miranda Kerr lieben sich noch immer best Orlando Bloom und Miranda Kerr lieben einander noch immer . base Orlando Bloom und Lucas Miranda lieben einander noch immer . src ′′ We ′ re pleased the FAA recognizes that an enjoyable passenger experience is not incompatible with safety and security , ′′ said Roger Dow , CEO of the U.S Travel Association . ref “ Wir freuen uns , dass die FAA erkennt , dass ein angenehmes Passagiererlebnis nicht im Widerspruch zur Sicherheit steht ” , sagte Roger Dow , CEO der U.S Travel Association . best ′′ Wir freuen uns , dass die FAA anerkennt , dass ein angenehmes ist nicht mit Sicherheit und
  • Src Wegen der von Berlin und der Europaischen Zentralbank verhangten strengen Sparpolitik in Verbindung mit der Zwangsjacke , in die die jeweilige nationale Wirtschaft durch das Festhalten an der gemeinsamen Wahrung genotigt wird , sind viele Menschen der Ansicht , das Projekt Europa sei zu weit gegangen ref The austerity imposed by Berlin and the European Central Bank , coupled with the straitjacket imposed on national economies through adherence to the common currency , has led many people to think Project Europe has gone too far
  • Best Because of the strict austerity measures imposed by Berlin and the European Central Bank in connection with the straitjacket in which the respective national economy is forced to adhere to the common currency , many people believe that the European project has gone too far
  • Base Because of the pressure imposed by the European Central Bank and the Federal Central Bank with the strict austerity imposed on the national economy in the face of the single currency , many people believe that the European project has gone too far
Methods
  • The authors evaluate the effectiveness of the models on the WMT translation tasks between English and German in both directions. newstest2013 (3000 sentences) is used as a development set to select the hyperparameters.
  • When training the NMT systems, following (Bahdanau et al, 2015; Jean et al, 2015), the authors filter out sentence pairs whose lengths exceed 50 words and shuffle mini-batches as the authors proceed.
  • The authors train for 12 epochs and start halving the learning rate after 8 epochs
Results
  • For English to German translation, the authors achieve new state-of-the-art (SOTA) results for both WMT’14 and WMT’15, outperforming previous SOTA systems, backed by NMT models and n-gram LM rerankers, by more than 1.0 BLEU.
Conclusion
  • The authors propose two simple and effective attentional mechanisms for neural machine

    16We concatenate the 508 sentence pairs with 1M sentence pairs from WMT and run the Berkeley aligner.

    17http://arxiv.org/abs/1508.04025 18The reference uses a more fancy translation of “incompatible”, which is “im Widerspruch zu etwas stehen”.
  • Ref “ Wir freuen uns , dass die FAA erkennt , dass ein angenehmes Passagiererlebnis nicht im Widerspruch zur Sicherheit steht ” , sagte Roger Dow , CEO der U.S Travel Association .
  • German-English translations src In einem Interview sagte Bloom jedoch , dass er und Kerr sich noch immer lieben .
  • The authors test the effectiveness of the models in the WMT translation tasks between English and German in both directions.
  • For the English to German translation direction, the ensemble model has established new state-of-the-art results for both WMT’14 and WMT’15
Summary
  • Introduction:

    Neural Machine Translation (NMT) achieved state-of-the-art performances in large-scale translation tasks such as from English to French (Luong et al, 2015) and English to German (Jean et al, 2015).
  • A B C D X Y Z is often a large neural network that is trained in an end-to-end fashion and has the ability to generalize well to very long word sequences
  • This means the model does not have to explicitly store gigantic phrase tables and language models as in the case of standard MT; NMT has a small memory footprint.
  • Implementing NMT decoders is easy unlike the highly intricate decoders in standard MT (Koehn et al, 2003)
  • Methods:

    The authors evaluate the effectiveness of the models on the WMT translation tasks between English and German in both directions. newstest2013 (3000 sentences) is used as a development set to select the hyperparameters.
  • When training the NMT systems, following (Bahdanau et al, 2015; Jean et al, 2015), the authors filter out sentence pairs whose lengths exceed 50 words and shuffle mini-batches as the authors proceed.
  • The authors train for 12 epochs and start halving the learning rate after 8 epochs
  • Results:

    For English to German translation, the authors achieve new state-of-the-art (SOTA) results for both WMT’14 and WMT’15, outperforming previous SOTA systems, backed by NMT models and n-gram LM rerankers, by more than 1.0 BLEU.
  • Conclusion:

    The authors propose two simple and effective attentional mechanisms for neural machine

    16We concatenate the 508 sentence pairs with 1M sentence pairs from WMT and run the Berkeley aligner.

    17http://arxiv.org/abs/1508.04025 18The reference uses a more fancy translation of “incompatible”, which is “im Widerspruch zu etwas stehen”.
  • Ref “ Wir freuen uns , dass die FAA erkennt , dass ein angenehmes Passagiererlebnis nicht im Widerspruch zur Sicherheit steht ” , sagte Roger Dow , CEO der U.S Travel Association .
  • German-English translations src In einem Interview sagte Bloom jedoch , dass er und Kerr sich noch immer lieben .
  • The authors test the effectiveness of the models in the WMT translation tasks between English and German in both directions.
  • For the English to German translation direction, the ensemble model has established new state-of-the-art results for both WMT’14 and WMT’15
Tables
  • Table1: WMT’14 English-German results – shown are the perplexities (ppl) and the tokenized BLEU scores of various systems on newstest2014. We highlight the best system in bold and give progressive improvements in italic between consecutive systems. local-p referes to the local attention with predictive alignments. We indicate for each attention model the alignment score function used in pararentheses
  • Table2: WMT’15 English-German results – NIST BLEU scores of the existing WMT’15 SOTA system and our best one on newstest2015
  • Table3: WMT’15 German-English results – performances of various systems (similar to Table 1). The base system already includes source reversing on which we add global attention, dropout, input feeding, and unk replacement
  • Table4: Attentional Architectures – performances of different attentional models. We trained two local-m (dot) models; both have ppl > 7.0
  • Table5: Sample translations – for each example, we show the source (src), the human translation (ref), the translation from our best model (best), and the translation of a non-attentional model (base). We italicize some correct translation segments and highlight a few wrong ones in bold
  • Table6: AER scores – results of various models on the RWTH English-German alignment data
Download tables as Excel
Funding
  • We gratefully acknowledge support from a gift from Bloomberg L.P. and the support of NVIDIA Corporation with the donation of Tesla K40 GPUs
Study subjects and analysis
sentence pairs with 1M sentence pairs from WMT and run: 508
In this paper, we propose two simple and effective attentional mechanisms for neural machine. 16We concatenate the 508 sentence pairs with 1M sentence pairs from WMT and run the Berkeley aligner. 17http://arxiv.org/abs/1508.04025 18The reference uses a more fancy translation of “incompatible”, which is “im Widerspruch zu etwas stehen”

Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
    Google ScholarFindings
  • Christian Buck, Kenneth Heafield, and Bas van Ooyen. 2014. N-gram counts and language models from the common crawl. In LREC.
    Google ScholarFindings
  • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP.
    Google ScholarFindings
  • Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 201End-to-end continuous speech recognition using attention-based recurrent NN: first results. CoRR, abs/1412.1602.
    Findings
  • Alexander Fraser and Daniel Marcu. 2007. Measuring word alignment quality for statistical machine translation. Computational Linguistics, 33(3):293–303.
    Google ScholarLocate open access versionFindings
  • Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. In ICML.
    Google ScholarFindings
  • Sebastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. On using very large target vocabulary for neural machine translation. In ACL.
    Google ScholarFindings
  • Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In EMNLP.
    Google ScholarFindings
  • Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In NAACL.
    Google ScholarFindings
  • Percy Liang, Ben Taskar, and Dan Klein. 2006. Alignment by agreement. In NAACL.
    Google ScholarFindings
  • Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the rare word problem in neural machine translation. In ACL.
    Google ScholarFindings
  • Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. In NIPS.
    Google ScholarFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and Wei jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL.
    Google ScholarFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 20Sequence to sequence learning with neural networks. In NIPS.
    Google ScholarFindings
  • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015.
    Google ScholarFindings
  • Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2015. Recurrent neural network regularization. In ICLR.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科