Loading Cost-Aware Model Caching and Request Routing for Cooperative Edge Inference

Mianyang Yao,Long Chen,Jun Zhang,Jiale Huang,Jigang Wu

ICC 2022 - IEEE International Conference on Communications（2022）

引用 1|浏览8

暂无评分

摘要

Most existing works on edge service caching and request routing fail to consider the influence of the service loading time. Meanwhile, the requests generated by end devices will change dynamically, which means that the caching strategy should adapt accordingly. In this paper, we investigate loading cost-aware joint model caching and request routing with cooperative edge computing, considering both the service loading time and the dynamic user requests. A system throughput maximization problem is formulated, which is proved to be NP-hard. Then, a randomized rounding-based online algorithm with M/(M - 2 lnN)-approximation ratio is proposed to solve it, where M and N are the numbers of end devices and deep neural network (DNN) models, respectively. Extensive experimental results demonstrate that our algorithm achieves more than 42.7% throughput gain than baseline algorithms.

查看译文

关键词

model caching,request routing,batching,loading time,inference time,throughput

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要