Neural Machine Translation in Linear Time

arXiv: Computation and Language, Volume abs/1610.10099, 2016.

Cited by: 344|Bibtex|Views179
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We have shown that the ByteNet decoder is a state-of-the-art character-level language model based on a convolutional neural network that outperforms recurrent neural language models

Abstract:

We present a novel neural network for processing sequences. The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence. The two network parts are connected by stacking the decoder on top of the encoder and preserving the tempora...More

Code:

Data:

0
Introduction
  • A neural network estimates a distribution over sequences of words or characters that belong to a given language (Bengio et al, 2003).
  • The network estimates a distribution over sequences in the target language conditioned on a given sequence in the source language.
  • Representation of the source encoder to generate the target sequence (Kalchbrenner & Blunsom, 2013).
Highlights
  • A neural network estimates a distribution over sequences of words or characters that belong to a given language (Bengio et al, 2003)
  • The network estimates a distribution over sequences in the target language conditioned on a given sequence in the source language
  • We have introduced the ByteNet, a neural translation model that has linear running time, decouples translation from memorization and has short signal propagation paths for tokens in sequences
  • We have shown that the ByteNet decoder is a state-of-the-art character-level language model based on a convolutional neural network that outperforms recurrent neural language models
  • We have shown that the ByteNet generalizes the Recurrent neural networks Enc-Dec architecture and achieves state-of-the-art results for character-to-character machine translation and excellent results in general, while maintaining linear running time complexity
Conclusion
  • The authors have introduced the ByteNet, a neural translation model that has linear running time, decouples translation from memorization and has short signal propagation paths for tokens in sequences.
  • The authors have shown that the ByteNet decoder is a state-of-the-art character-level language model based on a convolutional neural network that outperforms recurrent neural language models.
  • The authors have shown that the ByteNet generalizes the RNN Enc-Dec architecture and achieves state-of-the-art results for character-to-character machine translation and excellent results in general, while maintaining linear running time complexity.
  • The authors have revealed the latent structure learnt by the ByteNet and found it to mirror the expected alignment between the tokens in the sentences
Summary
  • Introduction:

    A neural network estimates a distribution over sequences of words or characters that belong to a given language (Bengio et al, 2003).
  • The network estimates a distribution over sequences in the target language conditioned on a given sequence in the source language.
  • Representation of the source encoder to generate the target sequence (Kalchbrenner & Blunsom, 2013).
  • Conclusion:

    The authors have introduced the ByteNet, a neural translation model that has linear running time, decouples translation from memorization and has short signal propagation paths for tokens in sequences.
  • The authors have shown that the ByteNet decoder is a state-of-the-art character-level language model based on a convolutional neural network that outperforms recurrent neural language models.
  • The authors have shown that the ByteNet generalizes the RNN Enc-Dec architecture and achieves state-of-the-art results for character-to-character machine translation and excellent results in general, while maintaining linear running time complexity.
  • The authors have revealed the latent structure learnt by the ByteNet and found it to mirror the expected alignment between the tokens in the sentences
Tables
  • Table1: Properties of various neural translation models
  • Table2: BLEU scores on En-De WMT NewsTest 2014 and 2015 test sets
  • Table3: Negative log-likelihood results in bits/byte on the Hutter Prize Wikipedia benchmark
  • Table4: Bits/character with respective BLEU score achieved by the ByteNet translation model on the English-to-German WMT translation task
  • Table5: Raw output translations generated from the ByteNet that highlight interesting reordering and transliteration phenomena. For each group, the first row is the English source, the second row is the ground truth German target, and the third row is the ByteNet translation
Download tables as Excel
Reference
  • Ba, Lei Jimmy, Kiros, Ryan, and Hinton, Geoffrey E. Layer normalization. CoRR, abs/1607.06450, 2016.
    Findings
  • Bahdanau, Dzmitry, Cho, Kyunghyun, and Bengio, Yoshua. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014.
    Findings
  • Bengio, Yoshua, Ducharme, Rejean, Vincent, Pascal, and Jauvin, Christian. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137– 1155, 2003.
    Google ScholarLocate open access versionFindings
  • Chen, Liang-Chieh, Papandreou, George, Kokkinos, Iasonas, Murphy, Kevin, and Yuille, Alan L. Semantic image segmentation with deep convolutional nets and fully connected crfs. CoRR, abs/1412.7062, 2014.
    Findings
  • He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Identity mappings in deep residual networks. CoRR, abs/1603.05027, 2016.
    Findings
  • Hochreiter, Sepp and Schmidhuber, Jurgen. Long shortterm memory. Neural computation, 1997.
    Google ScholarLocate open access versionFindings
  • Hochreiter, Sepp, Bengio, Yoshua, and Frasconi, Paolo. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In Kolen, J. and Kremer, S. (eds.), Field Guide to Dynamical Recurrent Networks. IEEE Press, 2001.
    Google ScholarLocate open access versionFindings
  • Hutter, Marcus. The human knowledge compression contest. http://prize.hutter1.net/, 2012.
    Findings
  • Kaiser, Łukasz and Bengio, Samy. Can active memory replace attention? Advances in Neural Information Processing Systems, 2016.
    Google ScholarLocate open access versionFindings
  • Kalchbrenner, Nal and Blunsom, Phil. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013.
    Google ScholarLocate open access versionFindings
  • Kalchbrenner, Nal, Danihelka, Ivo, and Graves, Alex. Grid long short-term memory. International Conference on Learning Representations, 2016a.
    Google ScholarLocate open access versionFindings
  • Kalchbrenner, Nal, van den Oord, Aaron, Simonyan, Karen, Danihelka, Ivo, Vinyals, Oriol, Graves, Alex, and Kavukcuoglu, Koray. Video pixel networks. CoRR, abs/1610.00527, 2016b.
    Findings
  • Kingma, Diederik P. and Ba, Jimmy. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
    Findings
  • Luong, Minh-Thang and Manning, Christopher D. Achieving open vocabulary neural machine translation with hybrid word-character models. In ACL, 2016.
    Google ScholarLocate open access versionFindings
  • Luong, Minh-Thang, Pham, Hieu, and Manning, Christopher D. Effective approaches to attention-based neural machine translation. In EMNLP, September 2015.
    Google ScholarLocate open access versionFindings
  • Mikolov, Tomas, Karafiat, Martin, Burget, Lukas, Cernocky, Jan, and Khudanpur, Sanjeev. Recurrent neural network based language model. In INTERSPEECH 2010, pp. 1045–1048, 2010.
    Google ScholarLocate open access versionFindings
  • Rocki, Kamil. Recurrent memory array structures. CoRR, abs/1607.03085, 2016.
    Findings
  • Simonyan, Karen, Vedaldi, Andrea, and Zisserman, Andrew. Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312.6034, 2013.
    Findings
  • Srivastava, Rupesh Kumar, Greff, Klaus, and Schmidhuber, Jurgen. Highway networks. CoRR, abs/1505.00387, 2015.
    Findings
  • Sutskever, Ilya, Vinyals, Oriol, and Le, Quoc V. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pp. 3104– 3112, 2014.
    Google ScholarLocate open access versionFindings
  • van den Oord, Aaron, Dieleman, Sander, Zen, Heiga, Simonyan, Karen, Vinyals, Oriol, Graves, Alex, Kalchbrenner, Nal, Senior, Andrew, and Kavukcuoglu, Koray. Wavenet: A generative model for raw audio. CoRR, abs/1609.03499, 2016a.
    Findings
  • van den Oord, Aaron, Kalchbrenner, Nal, and Kavukcuoglu, Koray. Pixel recurrent neural networks. In ICML, volume 48, pp. 1747–1756, 2016b.
    Google ScholarLocate open access versionFindings
  • Williams, Philip, Sennrich, Rico, Nadejde, Maria, Huck, Matthias, and Koehn, Philipp. Edinburgh’s syntax-based systems at WMT 2015. In Proceedings of the Tenth Workshop on Statistical Machine Translation, 2015.
    Google ScholarLocate open access versionFindings
  • Wu, Yonghui, Schuster, Mike, Chen, Zhifeng, Le, Quoc V., Norouzi, Mohammad, Macherey, Wolfgang, Krikun, Maxim, Cao, Yuan, Gao, Qin, Macherey, Klaus, Klingner, Jeff, Shah, Apurva, Johnson, Melvin, Liu, Xiaobing, ukasz Kaiser, Gouws, Stephan, Kato, Yoshikiyo, Kudo, Taku, Kazawa, Hideto, Stevens, Keith, Kurian, George, Patil, Nishant, Wang, Wei, Young, Cliff, Smith, Jason, Riesa, Jason, Rudnick, Alex, Vinyals, Oriol, Corrado, Greg, Hughes, Macduff, and Dean, Jeffrey. Googles neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144, 2016a.
    Findings
  • Wu, Yuhuai, Zhang, Saizheng, Zhang, Ying, Bengio, Yoshua, and Salakhutdinov, Ruslan. On multiplicative integration with recurrent neural networks. CoRR, abs/1606.06630, 2016b.
    Findings
  • Yu, Fisher and Koltun, Vladlen. Multi-scale context aggregation by dilated convolutions. CoRR, abs/1511.07122, 2015.
    Findings
  • Zhou, Jie, Cao, Ying, Wang, Xuguang, Li, Peng, and Xu, Wei. Deep recurrent models with fast-forward connections for neural machine translation. CoRR, abs/1606.04199, 2016.
    Findings
Full Text
Your rating :
0

 

Tags
Comments