BERTEPro : A new Sentence Embedding Framework for the Education and Professional Training domain

Guillaume Lefebvre,Haytham Elghazel,Theodore Guillet,Alexandre Aussem, Matthieu Sonnati

38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023(2023)

引用 0|浏览3
暂无评分
摘要
FlauBERT and CamemBERT have established a new state-of-the-art performance for French language understanding. Recently, SBERT has transformed the use of BERT, in order to reduce the computational effort of sentence similarity, while maintaining the accuracy of BERT. However, these models have been trained on non-specific texts of the French language, which does not allow for a fine-grained representation of texts from specific domains, such as the Education and professional training domain. In this paper, we present BERTEPro, a sentence embedding framework based on FlauBERT, whose pre-training using MLM (Masked Language Modeling) has been extended on education and professional training texts, before being fine-tuned on NLI (Natural Language Inference) and STS (Semantic Textual Similarity) tasks. The performance evaluation of BERTEPro on STS tasks, as well as on classification tasks, confirmed that the proposed methodology has significant advantages over other state-of-the-art methods.
更多
查看译文
关键词
NLP,Transformers,Sentence Similarity,Sentence Embedding,Education and professional training domain
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要