Hierarchical and Pairwise Document Embedding for Plagiarism Detection

Ruitong Zhang,Lianzhong Liu,Jiaofu Zhang,Zihang Huang, Caiwei Yang,Liangxuan Zhao,Tongge Xu

ADMA（2020）

引用 1|浏览10

暂无评分

摘要

The rapid development of the Internet, especially the application of search engines and machine translation, makes it easier to copy texts. Most existing text plagiarism detection methods are not capable of dealing with the increasing number of plagiarism sources and the increasingly ambiguous plagiarized texts. In this paper, we pay attention to the task of large-scale text deduplication, and propose a multi-level distributed text computing model, which improves the checking speed through multi-level latent semantic analysis, and combines BERT to judge plagiarized text more accurately. In order to further verify the model, we also combined the latest fuzzy plagiarism technology to construct a three-level data set. The experimental results show that our model performs well when plagiarism data increases and plagiarism ambiguity increases.

查看译文

关键词

Plagiarism detection,BERT,LSA

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要