Improving Machine Translation of Educational Content via Crowdsourcing.

LREC(2018)

引用 23|浏览136
暂无评分
摘要
The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translationmodels. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence ofusing crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of alower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domainby collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machinetranslation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collectedwith proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned withpre-existing in-domain corpora.
更多
查看译文
关键词
MOOCs, neural machine translation, crowdsourcing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要