Chrome Extension
WeChat Mini Program
Use on ChatGLM

Optimize DGL Operations on X86-64 Multi-Core Processors

HP3C(2022)

Cited 0|Views3
No score
Abstract
Modern x86-64 processors have strong performance due to long vector units. Therefore long vector units are widely used in CNN-like neural network model inference on modern x86-64 processors. However the performance of GNN inference on modern x86-64 processors is poor. Unfortunately, with the development of GNNs and the increase of graph datasets, GNN inference performance meets the serious challenge on x86-64 processors. In this paper, we study the problem of poorly optimized DGL-based GAT models on the x86-64 platform, and analyze the main performance bottlenecks in this case. In order to optimize the performance of DGL on the two main x86-64 platform CPUs of Intel and AMD, we implement a simple and effective task allocator to balance the task load among multiple cores and use vector instructions to optimize the core operators in DGL. In addition, we also propose corresponding optimization ideas for the NUMA architecture. The experimental results show that our optimization method improves the performance of the basic DGL version by up to 3.12x and 2.6x on Intel and AMD platforms.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined