谷歌浏览器插件
订阅小程序
在清言上使用

MultiLogVC: Efficient Out-of-Core Graph Processing Framework for Flash Storage

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(2021)

引用 3|浏览10
暂无评分
摘要
Graph analytics are at the heart of a broad range of applications such as drug discovery, page ranking, transportation systems, and recommendation models. When graph size exceeds the available memory size in a computing node, out-of-core graph processing is needed. For the widely used out-of-core graph processing systems. the graphs are stored and accessed from a long latency SSD storage, which becomes a significant performance bottleneck. To tackle this long latency this work exploits the key insight that that nearly all graph algorithms have a dynamically varying number of active vertices that must be processed in each iteration. However, existing graph processing frameworks, such as GraphChi, load the entire graph in each iteration even if a small fraction of the graph is active. This limitation is due to the structure of the graph storage used by these systems. In this work, we propose to use a compressed sparse row (CSR) based graph storage that is more amenable for selectively loading only a few active vertices in each iteration. However, CSR based graph processing suffers from random update propagation to many target vertices. To solve this challenge, we propose to use a multi-log update mechanism that logs updates separately, rather than directly update the active edges and vertices in a graph. The multi-log system maintains a separate log per each vertex interval (a group of vertices). This separation enables efficient processing of all updates bound to each vertex interval by just loading the corresponding log. Further, by logging all the updates associated with a vertex interval in one contiguous log this approach reduces read amplification since all the pages in the log will be processed in the next iteration without wasted page reads. Over the current state of the art out-of-core graph processing framework, our evaluation results show that the MultiLogVC framework improves performance by up to 17.84x, 1.19x, 1.65x, 1.38x, 3.15x, and 6.00x for the widely used breadth-first search, pagerank, community detection, graph coloring, maximal independent set, and random-walk applications, respectively.
更多
查看译文
关键词
out-of-core graph processing,graph analytics,SSD storage systems,log storage
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要