Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Cited by: 70|Views116

Abstract:

Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. This process, however, is often brittle: even with the same hyperparameter values, distinct random seeds can lead to substantially different results. To better understand this phenomenon, we exper...More

Code:

Data:

Full Text
Bibtex
Your rating :
0

 

Tags
Comments