Computation offloading for fast CNN inference in edge computing.

RACS(2019)

引用 11|浏览31
暂无评分
摘要
Convolutional Neural Network (CNN) is an important computation model for many popular mobile artificial intelligence applications. However, CNN inference, i.e., processing input data based on well-trained CNN models, is computation-intensive and incurs a heavy overhead for mobile devices with limited hardware resources. In this paper, we propose to offload a portion of CNN inference computation of mobile devices to the edge computing site. We find that batching tasks on GPU can significantly reduce average inference time on GPUs. Based on this important observation, we design an algorithm that jointly considers the tasks on all mobile devices and the corresponding batching benefit on the edge site, different from existing work on the collaborative inference that let each mobile device independently make offloading decisions. Furthermore, an online algorithm is proposed to handle the scenario that CNN inference tasks arrive at different time. It can significantly reduce average inference time without the knowledge of future task arrivals. Finally, extensive simulations are conducted to evaluate the performance of our proposed algorithms and the results show they outperform existing work under different settings.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要