Contrastive Distillation on Intermediate Representations for Language Model Compression
EMNLP 2020, pp. 498-508, 2020.
Extensive experiments demonstrate that Contrastive Distillation on Intermediate Representations is highly effective in both finetuning and pre-training stages, and achieves state-of-the-art performance on General Language Understanding Evaluate benchmark compared to existing mode...
Existing language model compression methods mostly use a simple L_2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one. Although widely used, this objective by design assumes that all the dimensions of hidden representations are independent, failing to capture important structural knowledg...More
PPT (Upload PPT)