Chrome Extension
WeChat Mini Program
Use on ChatGLM

Cache-locality Based Adaptive Warp Scheduling for Neural Network Acceleration on GPGPUs

Weiming Hu, Yi Zhou, Ying Quan,Yuanfeng Wang,Xin Lou

2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022)(2022)

Cited 0|Views11
No score
Abstract
In many emerging applications such as convolutional neural networks (CNNs), general purpose graph processing units (GPGPUs) are widely used as computing devices. For GPGPU computing, warp scheduling policy is crucial for the overall performance. We find that a single warp scheduling policy cannot provide optimal performance for all layers in a CNN model. In this paper, we analyze the workload of each layer in typical CNN models. It is observed that multiple layers of a network model have a significant performance gap under different scheduling policies. As a result, we propose a cache tag buffer to evaluate the workload characteristics according to the type of cache locality. Based on that, the warp scheduler adaptively selects the proper scheduling policy among Loosely-Round-Robin (LRR) and Greedy-Then-Oldest (GTO). Evaluation results show that the proposed mechanism is able to select the better warp scheduler between LRR and GTO at runtime, translating to superior performance for the computation of various CNN models on GPGPUs. We also show that the overhead introduced by the proposed scheduling method is small.
More
Translated text
Key words
general purpose graph processing unit (GPGPU),warp scheduling,convolutional neural network (CNN),cache locality
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined