Efficient Distributed Graph Analytics using Triply Compressed Sparse Format

2019 IEEE International Conference on Cluster Computing (CLUSTER)(2019)

引用 9|浏览28
暂无评分
摘要
This paper presents Triply Compressed Sparse Column (TCSC), a novel compression technique designed specifically for matrix-vector operations where the matrix as well as the input and output vectors are sparse. We refer to these operations as SpMSpV 2 . TCSC compresses the nonzero columns and rows of a highly sparse matrix representing a large real-world graph. During this compression, it encodes the sparsity patterns of the input and output vectors within the compressed representation of the sparse matrix itself. Consequently, it aligns the compressed indices of the input and output vectors with those of the compressed matrix columns and rows, thus eliminating the need for extra indirections when SpMSpV 2 operations access the vectors. This results in fewer cache misses, greater space efficiency and faster execution times. We evaluate TCSC's performance and show that it is more space and time efficient compared to CSC and DCSC, with up to 11× speedup. We integrate TCSC into GraphTap, our suggested linear algebra-based distributed graph analytics system. We compare GraphTap against GraphPad and LA3, two state-of-the-art linear algebra-based distributed graph analytics systems, using different dataset scales and numbers of processes. GraphTap is up to 7× faster than these systems due to TCSC and the resulting communication efficiency.
更多
查看译文
关键词
Triply compressed sparse column,TCSC,sparse matrix,sparse vector,SpMV,SpMSpV 2 ,distributed graph analytics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要