Using Source-Language Transformations to Address Register Mismatches in SMT.

AMTA(2020)

引用 28|浏览6
暂无评分
摘要
Mismatches between training and test data are a ubiquitous problem for real SMT applica- tions. In this paper, we examine a type of mismatch that commonly arises when translat- ing from French and similar languages: avail- able training data is mostly formal register, but test data may well be informal register. We consider methods for defining surface trans- formations that map common informal lan- guage constructions into their formal language counterparts, or vice versa; we then describe two ways to use these mappings, either to cre- ate artificial training data or to pre-process source text at run-time. An initial evalua- tion performed using crowd-sourced compar- isons of alternate translations produced by a French-to-English SMT system suggests that both methods can improve performance, with run-time pre-processing being the more effec- tive of the two.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要