Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

Cited by: 0|Bibtex|Views6
Other Links: arxiv.org

Abstract:

Although pretrained language models can be fine-tuned to produce state-of-the-art results for a very wide range of language understanding tasks, the dynamics of this process are not well understood, especially in the low data regime. Why can we use relatively vanilla gradient descent algorithms (e.g., without strong regularization) to t...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments