Implementing quasi-parallel breadth-first search in MapReduce for large-scale social network mining

CASoN(2013)

引用 2|浏览2
暂无评分
摘要
Online social networks like Weibo and Twitter consist of billions of users and connections, and traditional approaches which are based on serial algorithms and leveraged only a single node or even a single core cannot suffice the that scale of data any more. We propose new distributed quasi-parallel breadth-first search scheme, the common graph traversal algorithm, based on the MapReduce framework, which has better performance (up to one scale of magnitude less time complexity for single-source cases or even better for multiple-source cases) than Pegasus, the state-of-the-art graph mining library, in terms of the complexity of computation and the I/O load. We apply our algorithms on the Weibo dataset, crawled from its website, which contains 135 million users and 10.2 billion directed connections among them, and occupies up to 400 gigabytes. The dataset is by far the largest one of online social networks in research. Based on the Weibo dataset with extremely skewed degree distribution, we give the empirical time complexity and I/O load analysis in each iteration of our proposed methods. Also, We ran the experiments on a 20-node Hadoop cluster to validate our analysis, and the results conform to our predicted empirical results.
更多
查看译文
关键词
mapreduce framework,mapreduce,i-o load analysis,weibo dataset,tree searching,twitter,empirical time complexity,pegasus,serial algorithms,graph mining library,computational complexity,parallel algorithms,graph traversal algorithm,libraries,online social networks,graph mining,data mining,hadoop cluster,graph theory,website,breadth-first search,social networking (online),skewed degree distribution,distributed quasi-parallel breadth-first search scheme,large-scale social network mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要