Bayesian Browsing Model: Exact Inference of Document Relevance from Petabyte-Scale Data

TKDD(2010)

引用 24|浏览37
暂无评分
摘要
A fundamental challenge in utilizing Web search click data is to infer user-perceived relevance from the search log. Not only is the inference a difficult problem involving statistical reasonings but the bulky size, together with the ever-increasing nature, of the log data imposes extra requirements on scalability. In this paper, we propose the Bayesian Browsing Model (BBM), which performs exact inference of the document relevance, only requires a single pass of the data (i.e., the optimal scalability), and is shown effective. We present two sets of experiments to evaluate the model effectiveness and scalability. On the first set of over 50 million search instances of 1.1 million distinct queries, BBM outperforms the state-of-the-art competitor by 29.2% in log-likelihood while being 57 times faster. On the second click log set, spanning a quarter of petabyte, we showcase the scalability of BBM: we implemented it on a commercial MapReduce cluster, and it took only 3 hours to compute the relevance for 1.15 billion distinct query-URL pairs.
更多
查看译文
关键词
click log analysis,petabyte-scale data,optimal scalability,exact inference,click log set,utilizing web search click,log data,document relevance,billion distinct query-url pair,bayesian models,search log,user-perceived relevance,million search instance,web search,bayesian browsing model,bayesian model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要