AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We propose a two step pipeline for building a rapid unsupervised neural machine translation system for any language
Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation.
ACL (1), pp.3057-3062, (2019)
Given a rough, word-by-word gloss of a source language sentence, target language natives can uncover the latent, fully-fluent rendering of the translation. In this work we explore this intuition by breaking translation into a two step process: generating a rough gloss by means of a dictionary and then `translating' the resulting pseudo-...More
PPT (Upload PPT)
- Especially neural MT, highly depends on the amount of available parallel data.
- The quality of translation rapidly deteriorates as the amount of parallel data decreases (Koehn and Knowles, 2017).
- Many languages have close to zero parallel texts.
- Translating texts from these languages requires new techniques.
- The authors provide the following two-step solution to unsupervised neural machine translation: 1.
- Use a bilingual dictionary to gloss the input into a pseudo-translation or ‘Translationese’
- Quality of machine translation, especially neural MT, highly depends on the amount of available parallel data
- This work investigates the following question: Can we replace the human in the loop with more technology? We provide the following two-step solution to unsupervised neural machine translation: 1
- In this paper we propose using parallel data from high-resource languages to learn ‘how to translate’ and apply the trained system to low resource settings
- We introduce the following contributions in this paper:
- Following Hermjakob et al (2018), we propose a two step pipeline for building a rapid neural MT system for many languages
- Once the initial mapping matrix W is trained, a number of refinement steps is performed to improve performance over less frequent words by changing the metric of the space
- We propose a two step pipeline for building a rapid unsupervised neural machine translation system for any language
- The authors introduce a two-step pipeline for unsupervised machine translation.
- The authors introduce a fully unsupervised method for converting the source into Translationese, and the authors show how to train a Translationese to target system in advance and apply it to new source languages.
- The first step of the proposed pipeline includes a word-by-word translation of the source texts.
- This requires a source/target dictionary.
- In order to have a comprehensive, word to word, inflected bi-lingual dictionary the authors look for automatically built ones
- Once the initial mapping matrix W is trained, a number of refinement steps is performed to improve performance over less frequent words by changing the metric of the space.
- The authors propose a two step pipeline for building a rapid unsupervised neural machine translation system for any language.
- The pipeline does not require retraining the neural translation model when adapting to new source languages, which makes its application to new languages extremely fast and easy.
- The authors show how to obtain such a dictionary using off-the shelf tools.
- The authors use this system to translate test texts from 14 languages into English.
- The authors obtain better or comparable quality translation results on high-resource languages than previously published unsupervised
- Table1: Comparing translation results on newstest2014 for French, and newstest2016 for Russian, German, and Romanian with previous unsupervised NMT methods. <a class="ref-link" id="cKim_et+al_2018_a" href="#rKim_et+al_2018_a">Kim et al (2018</a>) is the method closest to our work. We report the quality of Translationese as well as the scores for our full model
- Table2: Translation results on ten new languages: Czech, Spanish, Finnish, Dutch, Bulgarian, Danish, Indonesian, Polish, Portuguese, and Catalan
- The research is based upon the work that took place in Information Sciences Institute (ISI) which was supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via AFRL Contract FA8650-17-C-9116 and by the Defense Advanced Research Projects Agency (DARPA) via contract HR0011-15-C-0115
- Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. Unsupervised neural machine translation. In Proc. ICLR.
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
- Yun Chen, Yang Liu, Yong Cheng, and Victor O.K. Li. 2017. A teacher-student framework for zeroresource neural machine translation. In Proc. ACL.
- Yong Cheng, Yang Liu, Qian Yang, Maosong Sun, and Wei Xu. 2016. Neural machine translation with pivot languages. arXiv preprint arXiv:1611.04928.
- Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016a. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proc. NAACL.
- Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016b. Zero-resource translation with multi-lingual neural machine translation. In Proc. EMNLP.
- Pascale Fung. 1995. Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus. In Workshop on Very Large Corpora.
- Thanh-Le Ha, Jan Niehues, and Alexander Waibel. 2016. Toward multilingual neural machine translation with universal encoder and decoder. arXiv preprint arXiv:1611.04798.
- Thanh-Le Ha, Jan Niehues, and Alexander Waibel. 2017. Effective strategies in zero-shot neural machine translation. arXiv preprint arXiv:1711.07893.
- Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick, and Dan Klein. 2008. Learning bilingual lexicons from monolingual corpora. In Proc. ACL.
- Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al. 2018. Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567.
- Ulf Hermjakob, Jonathan May, Michael Pust, and Kevin Knight. 2018. Translating a language you don’t know in the Chinese room. In Proc. ACL, System Demonstrations.
- Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viegas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351.
- Yunsu Kim, Jiahui Geng, and Hermann Ney. 2018. Improving unsupervised word-by-word translation with language model and denoising autoencoder. In Proc. EMNLP.
- Kevin Knight and Ishwar Chander. 1994. Automated postediting of documents. In Proc AAAI.
- Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proc. MT summit.
- Philipp Koehn and Kevin Knight. 2002. Learning a translation lexicon from monolingual corpora. In Proc. ACL workshop on Unsupervised lexical acquisition.
- Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proc. ACL Workshop on Neural Machine Translation.
- Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018a. Unsupervised machine translation using monolingual corpora only. In Proc. ICLR.
- Guillaume Lample, Alexis Conneau, MarcAurelio Ranzato, Ludovic Denoyer, and Herve Jegou. 2018b. Word translation without parallel data. In Proc. ICLR.
- Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018c. Phrase-based & neural unsupervised machine translation. In Proc. EMNLP.
- Malte Nuhn, Julian Schamper, and Hermann Ney. 2013. Beam search for solving substitution ciphers. In Proc. ACL.
- Victor Oswald. 1952. Word-by-word translation. In Proc. intervention ala Conference du MIT.
- Nima Pourdamghani and Kevin Knight. 2017. Deciphering related languages. In Proc. EMNLP.
- Sujith Ravi and Kevin Knight. 2011. Deciphering foreign language. In Proc. ACL.
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proc. ACL.
- Jorg Tiedemann. 2012. Parallel data, tools and interfaces in OPUS. In Proc. Lrec.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. NIPS.
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
- Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. Unsupervised neural machine translation with weight sharing. In Proc. ACL.
- Victor H. Yngve. 1955. Sentence-for-sentence translation. Mechanical Translation, 2(2):29–37.
- Hao Zheng, Yong Cheng, and Yang Liu. 2017. Maximum expected likelihood estimation for zeroresource neural machine translation. In Proc. IJCAI.