AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We propose a two step pipeline for building a rapid unsupervised neural machine translation system for any language

Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation.

Meeting of the Association for Computational Linguistics, (2019)

被引用0|浏览121
EI
下载 PDF 全文
引用
微博一下

摘要

Given a rough, word-by-word gloss of a source language sentence, target language natives can uncover the latent, fully-fluent rendering of the translation. In this work we explore this intuition by breaking translation into a two step process: generating a rough gloss by means of a dictionary and then `translating' the resulting pseudo-...更多

代码

数据

简介
  • Especially neural MT, highly depends on the amount of available parallel data.
  • The quality of translation rapidly deteriorates as the amount of parallel data decreases (Koehn and Knowles, 2017).
  • Many languages have close to zero parallel texts.
  • Translating texts from these languages requires new techniques.
  • The authors provide the following two-step solution to unsupervised neural machine translation: 1.
  • Use a bilingual dictionary to gloss the input into a pseudo-translation or ‘Translationese’
重点内容
  • Quality of machine translation, especially neural MT, highly depends on the amount of available parallel data
  • This work investigates the following question: Can we replace the human in the loop with more technology? We provide the following two-step solution to unsupervised neural machine translation: 1
  • In this paper we propose using parallel data from high-resource languages to learn ‘how to translate’ and apply the trained system to low resource settings
  • We introduce the following contributions in this paper:
  • Following Hermjakob et al (2018), we propose a two step pipeline for building a rapid neural MT system for many languages
  • We propose a two step pipeline for building a rapid unsupervised neural machine translation system for any language
方法
  • The authors introduce a two-step pipeline for unsupervised machine translation.
  • The authors introduce a fully unsupervised method for converting the source into Translationese, and the authors show how to train a Translationese to target system in advance and apply it to new source languages.
  • The first step of the proposed pipeline includes a word-by-word translation of the source texts.
  • This requires a source/target dictionary.
  • In order to have a comprehensive, word to word, inflected bi-lingual dictionary the authors look for automatically built ones
结论
  • The authors propose a two step pipeline for building a rapid unsupervised neural machine translation system for any language.
  • The pipeline does not require retraining the neural translation model when adapting to new source languages, which makes its application to new languages extremely fast and easy.
  • The authors show how to obtain such a dictionary using off-the shelf tools.
  • The authors use this system to translate test texts from 14 languages into English.
  • The authors obtain better or comparable quality translation results on high-resource languages than previously published unsupervised
表格
  • Table1: Comparing translation results on newstest2014 for French, and newstest2016 for Russian, German, and Romanian with previous unsupervised NMT methods. <a class="ref-link" id="cKim_et+al_2018_a" href="#rKim_et+al_2018_a">Kim et al (2018</a>) is the method closest to our work. We report the quality of Translationese as well as the scores for our full model
  • Table2: Translation results on ten new languages: Czech, Spanish, Finnish, Dutch, Bulgarian, Danish, Indonesian, Polish, Portuguese, and Catalan
Download tables as Excel
基金
  • The research is based upon the work that took place in Information Sciences Institute (ISI) which was supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via AFRL Contract FA8650-17-C-9116 and by the Defense Advanced Research Projects Agency (DARPA) via contract HR0011-15-C-0115
引用论文
  • Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. Unsupervised neural machine translation. In Proc. ICLR.
    Google ScholarLocate open access versionFindings
  • Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
    Google ScholarLocate open access versionFindings
  • Yun Chen, Yang Liu, Yong Cheng, and Victor O.K. Li. 2017. A teacher-student framework for zeroresource neural machine translation. In Proc. ACL.
    Google ScholarLocate open access versionFindings
  • Yong Cheng, Yang Liu, Qian Yang, Maosong Sun, and Wei Xu. 2016. Neural machine translation with pivot languages. arXiv preprint arXiv:1611.04928.
    Findings
  • Yunsu Kim, Jiahui Geng, and Hermann Ney. 2018. Improving unsupervised word-by-word translation with language model and denoising autoencoder. In Proc. EMNLP.
    Google ScholarLocate open access versionFindings
  • Kevin Knight and Ishwar Chander. 1994. Automated postediting of documents. In Proc AAAI.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proc. MT summit.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn and Kevin Knight. 2002. Learning a translation lexicon from monolingual corpora. In Proc. ACL workshop on Unsupervised lexical acquisition.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proc. ACL Workshop on Neural Machine Translation.
    Google ScholarLocate open access versionFindings
  • Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016a. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proc. NAACL.
    Google ScholarLocate open access versionFindings
  • Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016b. Zero-resource translation with multi-lingual neural machine translation. In Proc. EMNLP.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018a. Unsupervised machine translation using monolingual corpora only. In Proc. ICLR.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Alexis Conneau, MarcAurelio Ranzato, Ludovic Denoyer, and Herve Jegou. 2018b. Word translation without parallel data. In Proc. ICLR.
    Google ScholarLocate open access versionFindings
  • Pascale Fung. 1995. Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus. In Workshop on Very Large Corpora.
    Google ScholarLocate open access versionFindings
  • Thanh-Le Ha, Jan Niehues, and Alexander Waibel. 2016. Toward multilingual neural machine translation with universal encoder and decoder. arXiv preprint arXiv:1611.04798.
    Findings
  • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018c. Phrase-based & neural unsupervised machine translation. In Proc. EMNLP.
    Google ScholarLocate open access versionFindings
  • Malte Nuhn, Julian Schamper, and Hermann Ney. 2013. Beam search for solving substitution ciphers. In Proc. ACL.
    Google ScholarLocate open access versionFindings
  • Victor Oswald. 1952. Word-by-word translation. In Proc. intervention ala Conference du MIT.
    Google ScholarLocate open access versionFindings
  • Nima Pourdamghani and Kevin Knight. 2017. Deciphering related languages. In Proc. EMNLP.
    Google ScholarLocate open access versionFindings
  • Sujith Ravi and Kevin Knight. 2011. Deciphering foreign language. In Proc. ACL.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proc. ACL.
    Google ScholarLocate open access versionFindings
  • Jorg Tiedemann. 2012. Parallel data, tools and interfaces in OPUS. In Proc. Lrec.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. NIPS.
    Google ScholarLocate open access versionFindings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
    Findings
  • Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. Unsupervised neural machine translation with weight sharing. In Proc. ACL.
    Google ScholarLocate open access versionFindings
  • Victor H. Yngve. 1955. Sentence-for-sentence translation. Mechanical Translation, 2(2):29–37.
    Google ScholarLocate open access versionFindings
  • Hao Zheng, Yong Cheng, and Yang Liu. 2017. Maximum expected likelihood estimation for zeroresource neural machine translation. In Proc. IJCAI.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科