Estimating Sizes Of Social Networks Via Biased Sampling

WWW(2014)

引用 92|浏览45
暂无评分
摘要
This article presents algorithms for estimating the number of users in online social networks. Although such networks sometimes publish such statistics, there are good reasons to validate their reports. The proposed schemes can also estimate the cardinality of network subpopulations. Because this information is seldom voluntarily divulged, such algorithms must operate only by interacting with the social networks' public Applications Programming Interfaces (APIs). No other external information can be assumed. Due to obvious traffic and privacy concerns, the number of such interactions is severely limited. We therefore focus on minimizing the number of API interactions needed for producing good-sized estimates.We adopt the standard abstraction of social networks as undirected graphs and perform random walk-based node sampling. By counting the number of collisions or nonunique nodes in the sample, we produce a size estimate. Then we show analytically that the estimate error vanishes with high probability for fewer samples than those required by prior-art algorithms. Moreover, although provably correct for any graph, our algorithms excel when applied to social network-like graphs. The proposed algorithms were evaluated on synthetic and real social networks such as Facebook, IMDB, and DBLP. Our experiments corroborate the theoretical results and demonstrate the effectiveness of the algorithms.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要