Using Source-Language Transformations to Address Register Mismatches in SMT.

Manny Rayner,Pierrette Bouillon,Barry Haddow

AMTA（2020）

引用 28|浏览6

暂无评分

摘要

Mismatches between training and test data are a ubiquitous problem for real SMT applica- tions. In this paper, we examine a type of mismatch that commonly arises when translat- ing from French and similar languages: avail- able training data is mostly formal register, but test data may well be informal register. We consider methods for defining surface trans- formations that map common informal lan- guage constructions into their formal language counterparts, or vice versa; we then describe two ways to use these mappings, either to cre- ate artificial training data or to pre-process source text at run-time. An initial evalua- tion performed using crowd-sourced compar- isons of alternate translations produced by a French-to-English SMT system suggests that both methods can improve performance, with run-time pre-processing being the more effec- tive of the two.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要