CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs

Qingxiao Sun,Yi Liu,Hailong Yang, Ruizhe Zhang,Ming Dun,Mingzhen Li,Xiaoyan Liu,Wencong Xiao,Yong Li,Zhongzhi Luan,Depei Qian

SC22: International Conference for High Performance Computing, Networking, Storage and Analysis（2022）

引用 6|浏览68

暂无评分

摘要

Graph neural networks (GNNs) suffer from low GPU utilization due to frequent memory accesses. Existing concurrent training mechanisms cannot be directly adapted to GNNs because they fail to consider the impact of input irregularity. This requires pre-profiling the memory footprint of concurrent tasks based on input dimensions to ensure successful co-location on GPU. Moreover, massive training tasks generated from scenarios such as hyper-parameter tuning require flexible scheduling strategies. To address these problems, we propose CoGNN that enables efficient management of GNN training tasks on GPUs. Specifically, the CoGNN organizes the tasks in a queue and estimates the memory consumption of each task based on cost functions at operator basis. In addition, the CoGNN implements scheduling policies to generate task groups, which are iteratively submitted for execution. The experiment results show that the CoGNN can achieve shorter completion and queuing time for training tasks from diverse GNN models.

查看译文

关键词

Graph Neural Networks,GPU,Concurrent Training,Task Scheduling,Estimation Model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要