Shame To Be Sham: Addressing Content-Based Grey Hat Search Engine Optimization

SIGIR '13: The 36th International ACM SIGIR conference on research and development in Information Retrieval Dublin Ireland July, 2013(2013)

引用 6|浏览35
暂无评分
摘要
We present an initial study identifying a form of content-based grey hat search engine optimization, in which a Web page contains both potentially relevant content and manipulated content: we call such pages sham documents, because they lie in the grey area between 'ham' (clearly normal) and 'spam' (clearly fake). Sham documents are often ranked artificially high in response to certain queries, but also may contain some useful information and cannot be considered as absolute spam. We report a novel annotation effort performed with the ClueWeb09 benchmark where pages were labeled as being spam, sham, or legitimate content. Significant inter-annotator agreement rates support the claim that there are sham documents that are highly ranked by a very effective retrieval approach, yet are not spam. We also present an initial study of predictors that may indicate whether a query is the target of shamming.
更多
查看译文
关键词
sham,spam,search engine optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要