Bbm: Bayesian Browsing Model From Petabyte-Scale Data

Chao Liu,Fan Guo,Christos Faloutsos

KDD09: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Paris France June, 2009（2009）

引用 85|浏览90

暂无评分

摘要

Given a quarter of petabyte click log data, how can we estimate the relevance of each URL for a given query? In this paper, we propose the Bayesian Browsing Model (BBM), a new modeling technique with following advantages: (a) it does exact inference; (b) it is single-pass and parallelizable; (C) it is effective.We present two sets of experiments to test model effectiveness and efficiency. On the first set of over 50 million search instances of 1.1 million distinct queries, BBM outperforms the state-of-the-art competitor by 29.2% in log-likelihood while being 57 times faster. On the second click-log set, spanning a quarter of petabyte data, we showcase the scalability of BBM: we implemented it oil a commercial MapReduce cluster, and it took only 3 hours to compute the relevance for 1.15 billion distinct query-URL pairs.

查看译文

关键词

Bayesian models,click log analysis,web search

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要