Improving Machine Translation of Educational Content via Crowdsourcing.

Maximiliana Behnke,Antonio Valerio Miceli Barone,Rico Sennrich,Vilelmini Sosoni,Thanasis Naskos,Eirini Takoulidou,Maria Stasimioti,Menno van Zaanen,Sheila Castilho,Federico Gaspari,Panayota Georgakopoulou,Valia Kordoni,Markus Egg,Katia Lida Kermanidis

LREC（2018）

引用 23|浏览136

暂无评分

摘要

The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translationmodels. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence ofusing crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of alower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domainby collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machinetranslation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collectedwith proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned withpre-existing in-domain corpora.

查看译文

关键词

MOOCs, neural machine translation, crowdsourcing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要