CPM-2: Large-scale cost-effective pre-trained language models

Zhengyan Zhang,Yuxian Gu, Xu Han, Shengqi Chen, Chaojun Xiao, Zhenbo Sun,Yuan Yao,Fanchao Qi, Jian Guan,Pei Ke, Yanzheng Cai,Guoyang Zeng,Zhixing Tan,Zhiyuan Liu,Minlie Huang,Wentao Han, Yang Li,Xiaoyan Zhu,Maosong Sun

AI Open（2021）

引用 72|浏览745

暂无评分

摘要

In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely infmoe, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we compare CPM-2 with mT5 on downstream tasks. Experimental results show that CPM-2 has excellent general language intelligence. Moreover, we validate the efficiency of infmoe when conducting inference of large-scale models having tens of billions of parameters on a single GPU. All source code and model parameters are available at https://github.com/TsinghuaAI/CPM.

查看译文

关键词

Pre-trained language models,Model efficiency

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要