Opt: A New Framework For Overlapped And Parallel Triangulation In Large-Scale Graphs

MOD(2014)

引用 54|浏览39
暂无评分
摘要
Graph triangulation, which finds all triangles in a graph, has been actively studied due to its wide range of applications in the network analysis and data mining. With the rapid growth of graph data size, disk-based triangulation methods are in demand but little researched. To handle a large-scale graph which does not fit in memory, we must iteratively load small parts of the graph. In the existing literature, achieving the ideal cost has been considered to be impossible for billion-scale graphs due to the memory size constraint. In this paper, we propose an overlapped and parallel disk-based triangulation framework for billion-scale graphs, OPT, which achieves the ideal cost by (1) full overlap of the CPU and :1/0 operations and (2) full parallelism of multi-core CPU and FlashSSD I/O. In OPT, triangles in memory are called the internal triangles while triangles constituting vertices in memory and vertices in external memory are called the external triangles. At the macro level, OPT overlaps the internal triangulation and the external triangulation, while it overlaps the CPU and I/O operations at the micro level. Thereby, the cost of OPT is close to the ideal cost. Moreover, OPT instantiates both vertex-iterator and edge-iterator models and benefits from multi-thread parallelism on both types of triangulation. Extensive experiments conducted on large-scale datasets showed that (1) OPT achieved the elapsed time close to that of the ideal method with less than 7% of overhead under the limited memory budget, (2) OPT achieved linear speed-up with an increasing number of CPU cores, (3) OPT outperforms the state-ofthe-art parallel method by up to an order of magnitude with 6 CPU cores, and (4) for the first time in the literature, the triangulation results are reported for a billion-vertex scale real-world graph.
更多
查看译文
关键词
Triangulation,Big data,Parallel processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要