Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models

user-5f8cf7e04c775ec6fa691c92(2019)

引用 3|浏览62
暂无评分
摘要
In this work, we study how the large-scale pretrain-finetune framework changes the behavior of a neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. We find that after standard fine-tuning, the model \textit{forgets} important language generation skills acquired during large-scale pre-training. We demonstrate the forgetting phenomenon through a set of detailed behavior analysis from the perspectives of context sensitivity, knowledge transfer, and function space projection. Adopting the concept of data mixing, we propose an intuitive fine-tuning strategy named "mix-review". We find that mix-review effectively regularize the fine-tuning process, and the forgetting problem is largely alleviated. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.
更多
查看译文
关键词
Forgetting,Transformer (machine learning model),Knowledge transfer,Artificial intelligence,Phenomenon,Computer science,Function space,Context sensitivity,Response generation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要