Improving Neural Machine Translation with Linear Interpolation of a Short-Path Unit

ACM Transactions on Asian and Low-Resource Language Information Processing(2020)

引用 8|浏览56
暂无评分
摘要
In neural machine translation (NMT), the source and target words are at the two ends of a large deep neural network, normally mediated by a series of non-linear activations. The problem with such consequent non-linear activations is that they significantly decrease the magnitude of the gradient in a deep neural network, and thus gradually loosen the interaction between source words and their translations. As a result, a source word may be incorrectly translated into a target word out of its translational equivalents. In this article, we propose short-path units (SPUs) to strengthen the association of source and target words by allowing information flow over adjacent layers effectively via linear interpolation. In particular, we enrich three critical NMT components with SPUs: (1) an enriched encoding model with SPU, which interpolates source word embeddings linearly into source annotations; (2) an enriched decoding model with SPU, which enables the source context linearly flow to target-side hidden states; and (3) an enriched output model with SPU, which further allows linear interpolation of target-side hidden states into output states. Experimentation on Chinese-to-English, English-to-German, and low-resource Tibetan-to-Chinese translation tasks demonstrates that the linear interpolation of SPUs significantly improves the overall translation quality by 1.88, 1.43, and 3.75 BLEU, respectively. Moreover, detailed analysis shows that our approaches much strengthen the association of source and target words. From the preceding, we can see that our proposed model is effective both in rich- and low-resource scenarios.
更多
查看译文
关键词
Neural machine translation,low resource,neural networks,short-path units
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要