Exploiting Parallelism with Vertex-Clustering in Processing-In-Memory-based GCN Accelerators

Yu Zhu,Zhenhua Zhu,Guohao Dai,Kai Zhong,Huazhong Yang,Yu Wang

PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022)（2022）

引用 1|浏览11

暂无评分

摘要

Recently, Graph Convolutional Networks (GCNs) have shown powerful learning capabilities in graph processing tasks. Computing GCNs with conventional von Neumann architectures usually suffers from limited memory bandwidth due to the irregular memory access. Recent work has proposed Processing-In-Memory (PIM) architectures to overcome the bandwidth bottleneck in Convolutional Neural Networks (CNNs) by performing in-situ matrix-vector multiplication. However, the performance improvement and computation parallelism of existing CNN-oriented PIM architectures is hindered when performing GCNs because of the large scale and sparsity of graphs. To tackle these problems, this paper presents a parallelism enhancement framework for PIM-based GCN architectures. At the software level, we propose a fixed-point quantization method for GCNs, which reduces the PIM computation overhead with little accuracy loss. We also introduce the vertex clustering algorithm to the graph, minimizing the inter-cluster links and realizing cluster-level parallel computing on multi-core systems. At the hardware level, we design a Resistive Random Access Memory (RRAM) based multi-core PIM architecture for GCN, which supports the cluster-level parallelism. Besides, we propose a coarse-grained pipeline dataflow to cover the RRAM write costs and improve the GCN computation throughput. At the software/hardware interface level, we propose a PIM-aware GCN mapping strategy to achieve the optimal tradeoff between resource utilization and computation performance. We also propose edge dropping methods to reduce the inter-core communications with little accuracy loss. We evaluate our framework on typical datasets with multiple widelyused GCN models. Experimental results show that the proposed framework achieves 698x, 89x, and 41x speedup with 7108x, 255x, and 31x energy efficiency enhancement compared with CPUs, GPUs, and ASICs, respectively.

查看译文

关键词

accuracy loss,vertex clustering algorithm,intercluster links,multicore systems,hardware level,GCN computation throughput,PIM-aware GCN mapping strategy,resource utilization,computation performance,GCN models,vertex-clustering,graph convolutional networks,graph processing tasks,memory bandwidth,irregular memory access,convolutional neural networks,matrix-vector multiplication,performance improvement,parallelism enhancement framework,PIM-based GCN architectures,software level,fixed-point quantization method,PIM computation overhead,resistive random access memory based multicore PIM architecture,cluster-level parallel computing,intercore communications,processing-in-memory-based GCN accelerators

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要