Contrastive Distillation on Intermediate Representations for Language Model Compression

EMNLP 2020, pp. 498-508, 2020.

Cited by: 0|Views64
Weibo:
Extensive experiments demonstrate that Contrastive Distillation on Intermediate Representations is highly effective in both finetuning and pre-training stages, and achieves state-of-the-art performance on General Language Understanding Evaluate benchmark compared to existing mode...

Abstract:

Existing language model compression methods mostly use a simple L_2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one. Although widely used, this objective by design assumes that all the dimensions of hidden representations are independent, failing to capture important structural knowledg...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Your rating :
0

 

Tags
Comments