Vandalism Detection in Wikidata

ACM International Conference on Information and Knowledge Management(2016)

引用 110|浏览909
暂无评分
摘要
Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation which can be edited by anyone. Its knowledge is increasingly used within Wikipedia as well as in all kinds of information systems, which imposes high demands on its integrity. Nevertheless, Wikidata frequently gets vandalized, exposing all its users to the risk of spreading vandalized and falsified information. In this paper, we present a new machine learning-based approach to detect vandalism in Wikidata. We engineer 47 features that exploit both content and context information, and we report on 4 classifiers of increasing effectiveness tailored to this learning task. Our approach is evaluated on the recently published Wikidata Vandalism Corpus WDVC-2015 and achieves an area under curve of the receiver operating characteristic (ROC-AUC) of 0.991, thereby significantly outperforming the state of the art represented by the rule-based Wikidata Abuse Filter (0.865 ROC-AUC) and a prototypical vandalism detector recently introduced by Wikimedia (0.868 ROC-AUC).
更多
查看译文
关键词
Data Quality,Knowledge Base,Vandalism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要