A Distributed Multi-GPU System for Large-Scale Node Embedding at Tencent

Wei Wanjing,Wang Yangzihao,Gao Pin,Sun Shijie, Yu Donghai

arxiv（2020）

引用 1|浏览14

暂无评分

摘要

Scaling node embedding systems to efficiently process networks in real-world applications that often contain hundreds of billions of edges with high-dimension node features remains a challenging problem. In this paper we present a high-performance multi-GPU node embedding system that uses hybrid model data parallel training. We propose a hierarchical data partitioning strategy and an embedding training pipeline to optimize both communication and memory usage on a GPU cluster. With the decoupled design of our random walk engine and embedding training engine, we can run both random walk and embedding training with high flexibility to fully utilize all computing resources on a GPU cluster. We evaluate the system on real-world and synthesized networks with various node embedding tasks. Using 40 NVIDIA V100 GPUs on a network with over two hundred billion edges and one billion nodes, our implementation requires only 200 seconds to finish one training epoch. We also achieve 5.9x-14.4x speedup on average over the current state-of-the-art multi-GPU single-node embedding system with competitive or better accuracy on open datasets.

查看译文

关键词

multi-gpu,large-scale

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要