Minimizing GPU Kernel Launch Overhead in Deep Learning Inference on Mobile GPUs

HOTMOBILE（2021）

引用 12|浏览10

暂无评分

摘要

ABSTRACTThe need for on-device real-time Deep Learning inference is increasing as deep learning on edge devices such as smartphones and robots are becoming popular. Although hardware acceleration on NPU is attracting more attention, the recent mobile GPUs are fast enough to provide the potential to achieve real-time inference of many CNNs. In this paper, we first analyze the inference time of the widely used CNNs on the recent mobile GPUs and reveal that significant overhead exists for the GPU kernel launches. Then, we identify various factors that cause the kernel launch overhead, from which we formulate a performance model that can predict the optimal period for the kernel flush that can lead to the minimal overhead. Our experimental results show that we could achieve up to 64% and 31% of speedups in the inference of various CNNs with TensorFlow Lite and ARM Compute Library on Adreno 650 GPU and Mali G76 GPU.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要