Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs

Ren Xiangyuan
Ren Xiangyuan
Sun Zijun
Sun Zijun
Li Xiaoya
Li Xiaoya
Yuan Arianna
Yuan Arianna
Cited by: 3|Views24

Abstract:

In this paper, we investigate the problem of training neural machine translation (NMT) systems with a dataset of more than 40 billion bilingual sentence pairs, which is larger than the largest dataset to date by orders of magnitude. Unprecedented challenges emerge in this situation compared to previous NMT work, including severe noise i...More

Code:

Data:

Full Text
Bibtex
Your rating :
0

 

Tags
Comments