Are ELECTRA's Sentence Embeddings Beyond Repair? The Case of Semantic Textual Similarity
CoRR(2024)
摘要
While BERT produces high-quality sentence embeddings, its pre-training
computational cost is a significant drawback. In contrast, ELECTRA delivers a
cost-effective pre-training objective and downstream task performance
improvements, but not as performant sentence embeddings. The community tacitly
stopped utilizing ELECTRA's sentence embeddings for semantic textual similarity
(STS). We notice a significant drop in performance when using the ELECTRA
discriminator's last layer in comparison to earlier layers. We explore this
drop and devise a way to repair ELECTRA's embeddings, proposing a novel
truncated model fine-tuning (TMFT) method. TMFT improves the Spearman
correlation coefficient by over 8 points while increasing parameter efficiency
on the STS benchmark dataset. We extend our analysis to various model sizes and
languages. Further, we discover the surprising efficacy of ELECTRA's generator
model, which performs on par with BERT, using significantly fewer parameters
and a substantially smaller embedding size. Finally, we observe further boosts
by combining TMFT with a word similarity task or domain adaptive pre-training.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要