Overcoming browser cookie churn with clustering.

WSDM(2012)

引用 15|浏览129
暂无评分
摘要
ABSTRACTMany large Internet websites are accessed by users anonymously, without requiring registration or logging-in. However, to provide personalized service these sites build anonymous, yet persistent, user models based on repeated user visits. Cookies, issued when a web browser first visits a site, are typically employed to anonymously associate a website visit with a distinct user (web browser). However, users may reset cookies, making such association short-lived and noisy. In this paper we propose a solution to the cookie churn problem: a novel algorithm for grouping similar cookies into clusters that are more persistent than individual cookies. Such clustering could potentially allow more robust estimation of the number of unique visitors of the site over a certain long time period, and also better user modeling which is key to plenty of web applications such as advertising and recommender systems. We present a novel method to cluster browser cookies into groups that are likely to belong to the same browser based on a statistical model of browser visitation patterns. We address each step of the clustering as a binary classification problem estimating the probability that two different subsets of cookies belong to the same browser. We observe that our clustering problem is a generalized interval graph coloring problem, and propose a greedy heuristic algorithm for solving it. The scalability of this method allows us to cluster hundreds of millions of browser cookies and provides significant improvements over baselines such as constrained K-means.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要