Optimize DGL Operations on X86-64 Multi-Core Processors

Chaorun Liu,Huayou Su,Yong Dou,Qinglin Wang

HP3C（2022）

Cited 0|Views3

No score

Abstract

Modern x86-64 processors have strong performance due to long vector units. Therefore long vector units are widely used in CNN-like neural network model inference on modern x86-64 processors. However the performance of GNN inference on modern x86-64 processors is poor. Unfortunately, with the development of GNNs and the increase of graph datasets, GNN inference performance meets the serious challenge on x86-64 processors. In this paper, we study the problem of poorly optimized DGL-based GAT models on the x86-64 platform, and analyze the main performance bottlenecks in this case. In order to optimize the performance of DGL on the two main x86-64 platform CPUs of Intel and AMD, we implement a simple and effective task allocator to balance the task load among multiple cores and use vector instructions to optimize the core operators in DGL. In addition, we also propose corresponding optimization ideas for the NUMA architecture. The experimental results show that our optimization method improves the performance of the basic DGL version by up to 3.12x and 2.6x on Intel and AMD platforms.

Translated text

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined