Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation
EACL(2021)
摘要
There has been recent success in pre-training on monolingual data and fine-tuning on Machine Translation (MT), but it remains unclear how to best leverage a pre-trained model for a given MT task. This paper investigates the benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on MT. We focus on 1) Fine-tuning a model trained only on English monolingual data, BART. 2) Fine-tuning a model trained on monolingual data from 25 languages, mBART. For BART we get the best performance by freezing most of the model parameters, and adding extra positional embeddings. For mBART we match the performance of naive fine-tuning for most language pairs, and outperform it for Nepali to English (0.5 BLEU) and Czech to English (0.6 BLEU), all with a lower memory cost at training time. When constraining ourselves to an out-of-domain training set for Vietnamese to English we outperform the fine-tuning baseline by 0.9 BLEU.
更多查看译文
关键词
machine translation,multilingual models,monolingual,pre-trained
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络