Probabilistic Approaches to Controversy Detection

ACM International Conference on Information and Knowledge Management(2016)

引用 48|浏览106
暂无评分
摘要
Recently, the problem of automated controversy detection has attracted a lot of interest in the information retrieval community. Existing approaches to this problem have set forth a number of detection algorithms, but there has been little effort to model controversy directly. In this paper, we propose a probabilistic framework to detect controversy on the web, and investigate two models. We first recast a state-of-the-art controversy detection algorithm into a model in our framework. Based on insights from social science research, we also introduce a language modeling approach to this problem. We extensively evaluate different methods of creating controversy language models based on a diverse set of public datasets including Wikipedia, Web and News corpora. Based on insights from social science research, we build language models of controversy and introduce formal models of both prior work and our own. Our automatically derived language models show a significant relative improvement of 18% in AUC over prior work, and 23% over two manually curated lexicons.
更多
查看译文
关键词
controversy detection,language model,information retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要