HuGraph: Acceleration of GCN Training on Heterogeneous FPGA Clusters with Quantization

Letian Zhao,Qizhe Wu,Xiaotian Wang,Teng Tian,Wei Wu,Xi Jin

2022 IEEE High Performance Extreme Computing Conference (HPEC)（2022）

引用 2|浏览8

暂无评分

摘要

Graph convolutional networks (GCNs) have suc-ceeded significantly in numerous fields, but the need for higher performance and energy efficiency training GCN on larger graphs continues unabated. At the same time, since recon-figurable accelerators have the ability to fine-grained custom computing modules and data movement, FPGAs can solve problems such as irregular memory access for GCN computing. Furthermore, to scale GCN computation, the use of heteroge-neous FPGAs is inevitable due to the constant iteration of new FPGAs. In this paper, we propose a novel framework, HuGraph, which automatically maps GCN training on heterogeneous FPGA clusters. With HuGraph, FPGAs work in synchronous data parallelism using a simple ring 1D topology that is suitable for most off-the-shelf FPGA clusters. HuGraph uses three approaches to advance performance and energy efficiency. First, HuGraph applies full-process quantization for neighbor-sampling-based data parallel training, thereby reducing computation and mem-ory consumption. Second, a novel balanced sampler is used to balance workloads among heterogeneous FPGAs so that FPGAs with fewer resources do not become bottlenecks in the cluster. Third, HuGraph schedules the execution order of GCN training to minimize time overhead. We implement a prototype on a single FPGA and evaluate cluster-level performance with a cycle-accurate simulator. Experiments show that HuGraph achieves up to 102.3 ×, 4.62×, and 11.1× speedup compared with the state-of-the-art works on CPU, GPU, and FPGA platforms, respectively, with negligible accuracy loss.

查看译文

关键词

Graph Convolutional Network Training,Recon-figurable Computing,Parallel and Distributed Systems

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要