Automatic Construction of Morphologically Motivated Translation Models for Highly Inflected, Low-Resource Languages.

AMTA(2016)

引用 3|浏览86
暂无评分
摘要
Abstract Statistical Machine Translation (SMT) of highly inflected, low-resource languages suffers from the problem of low bitext availability, which is exacerbated by large inflectional paradigms. When translating into English, rich source inflections have a high chance of being poorly estimated or out-of-vocabulary (OOV). We present a source language-agnostic system for automatically constructing phrase pairs from foreign-language inflections and their morphological analyses using manually constructed datasets, including Wiktionary. We then demonstrate the utility of these phrase tables in improving translation into English from Finnish, Czech, and Turkish in simulated low-resource settings, finding substantial gains in translation quality. We report up to+ 2.58 BLEU in a simulated low-resource setting and+ 1.65 BLEU in a moderateresource setting. We release our morphologically-motivated translation models, with tens of thousands of inflections in each of 8 languages.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要