bert2BERT: Towards Reusable Pretrained Language Models

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS)(2022)

引用 50|浏览601
暂无评分
摘要
In recent years, researchers tend to pre-train ever-larger language models to explore the upper limit of deep models. However, large language model pre-training costs intensive computational resources, and most of the models are trained from scratch without reusing the existing pre-trained models, which is wasteful. In this paper, we propose bert2BERT1, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model through parameter initialization and significantly improve the pre-training efficiency of the large model. Specifically, we extend the previous function-preserving (Chen et al., 2016) method proposed in computer vision on the Transformer-based language model, and further improve it by proposing a novel method, advanced knowledge for the large model's initialization. In addition, a two-stage learning method is proposed to further accelerate the pre-training. We conduct extensive experiments on representative PLMs (e.g., BERT and GPT) and demonstrate that (1) our method can save a significant amount of training cost compared with baselines including learning from scratch, StackBERT (Gong et al., 2019) and MSLT (Yang et al., 2020); (2) our method is generic and applicable to different types of pretrained models. In particular, bert2BERT saves about 45% and 47% computational cost of pretraining BERTBASE and GPTBASE by reusing the models of almost their half sizes.
更多
查看译文
关键词
reusable pretrained language models,language models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要