OpenNMT: Open-Source Toolkit for Neural Machine Translation

ACL (System Demonstrations), pp. 67-72, 2017.

Cited by: 886|Bibtex|Views232|DOI:https://doi.org/10.18653/v1/P17-4012
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We introduce OpenNMT, a research toolkit for Neural machine translation that prioritizes efficiency and modularity

Abstract:

We describe an open-source toolkit for neural machine translation (NMT). The toolkit prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The to...More

Code:

Data:

0
Introduction
  • Neural machine translation (NMT) is a new methodology for machine translation that has led to remarkable improvements, in terms of human evaluation, compared to rule-based and statistical machine translation (SMT) systems (Wu et al, 2016; Crego et al, 2016).
  • The red source words are first mapped to word vectors and fed into a recurrent neural network (RNN).
  • At each target time step, attention is applied over the source RNN and combined with the current hidden state to produce a prediction p of the word.
  • This prediction is fed back into the target RNN.
  • The target decoder combines an RNN hidden representation of previously generated words (w1, ...wt−1) with source hidden vectors to predict scores for each possible word
Highlights
  • Neural machine translation (NMT) is a new methodology for machine translation that has led to remarkable improvements, in terms of human evaluation, compared to rule-based and statistical machine translation (SMT) systems (Wu et al, 2016; Crego et al, 2016)
  • The red source words are first mapped to word vectors and fed into a recurrent neural network (RNN)
  • In the development of this project, we aimed to build upon the strengths of this system, while providing additional documentation and functionality to provide a useful open-source Neural machine translation framework Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics-System Demonstrations, pages 67–72 Vancouver, Canada, July 30 - August 4, 2017. c 2017 Association for Computational Linguistics https://doi.org/10.18653/v1/P17-4012 for the NLP community in academia and industry. With these goals in mind, we introduce OpenNMT, an opensource framework for neural machine translation
  • We introduce OpenNMT, a research toolkit for Neural machine translation that prioritizes efficiency and modularity
Methods
  • Design Goals

    As the low-level details of NMT have been covered previously (see for instance (Neubig, 2017)), the authors focus this report on the design goals of OpenNMT: system efficiency, code modularity, and model extensibility.

    4.1 System Efficiency

    As NMT systems can take from days to weeks to train, training efficiency is a paramount concern.
  • Memory Sharing When training GPU-based NMT models, memory size restrictions are the most common limiter of batch size, and directly impact training time.
  • Neural network toolkits, such as Torch, are often designed to trade-off extra memory allocations for speed and declarative simplicity.
  • Aggressive memory reuse in OpenNMT provides a saving of 70% of GPU memory with the default model size
Conclusion
  • The authors introduce OpenNMT, a research toolkit for NMT that prioritizes efficiency and modularity.
  • The authors hope to further develop OpenNMT to maintain strong MT results at the research frontier, providing a stable and framework for production use.
  • 1http://statmt.org/wmt15 2https://github.com/rsennrich/nematus.
  • Comparison with OpenNMT/Nematus github revisions 907824/75c6ab1.
  • 3http://opus.lingfil.uu.se
  • Comparison with OpenNMT/Nematus github revisions 907824/75c6ab1. 3http://opus.lingfil.uu.se
Summary
  • Introduction:

    Neural machine translation (NMT) is a new methodology for machine translation that has led to remarkable improvements, in terms of human evaluation, compared to rule-based and statistical machine translation (SMT) systems (Wu et al, 2016; Crego et al, 2016).
  • The red source words are first mapped to word vectors and fed into a recurrent neural network (RNN).
  • At each target time step, attention is applied over the source RNN and combined with the current hidden state to produce a prediction p of the word.
  • This prediction is fed back into the target RNN.
  • The target decoder combines an RNN hidden representation of previously generated words (w1, ...wt−1) with source hidden vectors to predict scores for each possible word
  • Methods:

    Design Goals

    As the low-level details of NMT have been covered previously (see for instance (Neubig, 2017)), the authors focus this report on the design goals of OpenNMT: system efficiency, code modularity, and model extensibility.

    4.1 System Efficiency

    As NMT systems can take from days to weeks to train, training efficiency is a paramount concern.
  • Memory Sharing When training GPU-based NMT models, memory size restrictions are the most common limiter of batch size, and directly impact training time.
  • Neural network toolkits, such as Torch, are often designed to trade-off extra memory allocations for speed and declarative simplicity.
  • Aggressive memory reuse in OpenNMT provides a saving of 70% of GPU memory with the default model size
  • Conclusion:

    The authors introduce OpenNMT, a research toolkit for NMT that prioritizes efficiency and modularity.
  • The authors hope to further develop OpenNMT to maintain strong MT results at the research frontier, providing a stable and framework for production use.
  • 1http://statmt.org/wmt15 2https://github.com/rsennrich/nematus.
  • Comparison with OpenNMT/Nematus github revisions 907824/75c6ab1.
  • 3http://opus.lingfil.uu.se
  • Comparison with OpenNMT/Nematus github revisions 907824/75c6ab1. 3http://opus.lingfil.uu.se
Tables
  • Table1: Translation speed in source tokens per second for the Torch CPU/GPU implementations and for the multithreaded CPU C implementation. (Run with Intel i7/GTX 1080)
  • Table2: Table 2
  • Table3: Performance Results for EN→DE on WMT15 tested on newstest2014. Both system 2x500 RNN, embedding size 300, 13 epochs, batch size 64, beam size 5. We compare on a 50k vocabulary and a 32k BPE setting. OpenNMT shows improvements in speed and accuracy compared to Nematus
Download tables as Excel
Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation By Jointly Learning To Align and Translate. In ICLR. pages 1–15. https://doi.org/10.1146/annurev.neuro.26.041002.131047.
    Locate open access versionFindings
  • Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, August 11-12, 2016. pages 10–21. http://aclweb.org/anthology/K/K16/K16-1002.pdf.
    Locate open access versionFindings
  • William Chan, Navdeep Jaitly, Quoc V. Le, and Oriol Vinyals. 2015.
    Google ScholarFindings
  • Listen, attend and spell. CoRR abs/1508.01211. http://arxiv.org/abs/1508.01211.
    Findings
  • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proc of EMNLP.
    Google ScholarLocate open access versionFindings
  • Sumit Chopra, Michael Auli, Alexander M Rush, and SEAS Harvard. 201Abstractive sentence summarization with attentive recurrent neural networks. Proceedings of NAACL-HLT16 pages 93–98.
    Google ScholarLocate open access versionFindings
  • Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
    Findings
  • Josep Crego, Jungi Kim, and Jean Senellart. 2016. Systran’s pure neural machine translation system. arXiv preprint arXiv:1602.06023.
    Findings
  • Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. 2012. Large scale distributed deep networks. In Advances in neural information processing systems. pages 1223–1231.
    Google ScholarLocate open access versionFindings
  • Yuntian Deng, Anssi Kanervisto, and Alexander M. Rush. 2016. What you get is what you see: A visual markup decompiler. CoRR abs/1609.04938. http://arxiv.org/abs/1609.04938.
    Findings
  • Chris Dyer, Jonathan Weese, Hendra Setiawan, Adam Lopez, Ferhan Ture, Vladimir Eidelman, Juri Ganitkevitch, Phil Blunsom, and Philip Resnik. 2010. cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models. In Proc ACL. Association for Computational Linguistics, pages 7–12.
    Google ScholarLocate open access versionFindings
  • Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proc of EMNLP.
    Google ScholarLocate open access versionFindings
  • Andre FT Martins and Ramon Fernandez Astudillo. 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. arXiv preprint arXiv:1602.02068.
    Findings
  • G. Neubig. 2017. Neural Machine Translation and Sequenceto-sequence Models: A Tutorial. ArXiv e-prints.
    Google ScholarFindings
  • Graham Neubig. 2013. Travatar: A forest-to-string machine translation engine based on tree transducers. In Proc ACL. Sofia, Bulgaria.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich and Barry Haddow. 20Linguistic input features improve neural machine translation. arXiv preprint arXiv:1606.02892.
    Findings
  • 2015. Neural machine translation of rare words with subword units.
    Google ScholarFindings
  • http://arxiv.org/abs/1508.07909.
    Findings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS. page 9. http://arxiv.org/abs/1409.3215.
    Findings
  • Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869.
    Findings
  • Jason Weston, Sumit Chopra, and Antoine Bordes. 2014. Memory networks. CoRR abs/1410.3916. http://arxiv.org/abs/1410.3916.
    Findings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
    Findings
  • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015.
    Google ScholarFindings
  • Show, attend and tell: Neural image caption generation with visual attention. CoRR abs/1502.03044. http://arxiv.org/abs/1502.03044.
    Findings
  • Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proc ACL.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long shortterm memory. Neural computation 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Mike Schuster Quoc V. Le Maxim Krikun Yonghui Wu Zhifeng Chen Nikhil Thorat Fernanda Vigas Martin Wattenberg Greg Corrado Macduff Hughes Jeffrey Dean Johnson. 2016. Google’s multilingual neural machine translation system: Enabling zero-shot translation.
    Google ScholarFindings
  • Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proc ACL. Association for Computational Linguistics, pages 177–180.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments