Taxonomy of Mathematical Plagiarism
CoRR(2024)
摘要
Plagiarism is a pressing concern, even more so with the availability of large
language models. Existing plagiarism detection systems reliably find copied and
moderately reworded text but fail for idea plagiarism, especially in
mathematical science, which heavily uses formal mathematical notation. We make
two contributions. First, we establish a taxonomy of mathematical content reuse
by annotating potentially plagiarised 122 scientific document pairs. Second, we
analyze the best-performing approaches to detect plagiarism and mathematical
content similarity on the newly established taxonomy. We found that the
best-performing methods for plagiarism and math content similarity achieve an
overall detection score (PlagDet) of 0.06 and 0.16, respectively. The
best-performing methods failed to detect most cases from all seven newly
established math similarity types. Outlined contributions will benefit research
in plagiarism detection systems, recommender systems, question-answering
systems, and search engines. We make our experiment's code and annotated
dataset available to the community:
https://github.com/gipplab/Taxonomy-of-Mathematical-Plagiarism
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要