Deep Transformer modeling via grouping skip connection for neural machine translation

Knowledge-Based Systems（2021）

引用 10|浏览9

暂无评分

摘要

Most of the deep neural machine translation (NMT) models are based on a bottom-up feedforward fashion, in which representations in low layers construct or modulate high layers representations. We conjecture that this unidirectional encoding fashion could be a potential issue in building a deep NMT model. In this paper, we propose to build a deeper Transformer encoder by properly organizing encoder layers into multiple groups, which are connected via a grouping skip connection mechanism. Here, each group is further appropriately fed into subsequent groups to build a deep Transformer encoder. In this way, we successfully build a deep Transformer encoder with up to 48 layers. Moreover, we can share the parameters among groups to extend the encoder (virtual) depth even without introducing additional parameters. Detailed experimentation on the large-scale WMT (workshop on machine translation) 2014 English-to-German, English-to-French translation, WMT 2016 English-to-German, and WMT 2017 Chinese-to-English tasks demonstrates that our proposed deep Transformer model significantly outperforms the strong Transformer baseline. Furthermore, we carry out linguistic probing tasks to analyze the problems existing in the original Transformer model and explain how our deep Transformer encoder improves the translation quality. One particularly nice property of our approach is that it is incredibly easy to implement. We make our code available on Github https://github.com/liyc7711/deep-nmt.

查看译文

关键词

Neural machine translation,Grouping skip connection,Deep NMT,Transformer

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要