Multi-tenant mobile offloading systems for real-time computer vision applications.

ICDCN '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING(2019)

引用 14|浏览1680
暂无评分
摘要
Offloading techniques enable many emerging computer vision applications on mobile platforms by executing compute-intensive tasks on resource-rich servers. Although there have been a significant amount of research efforts devoted in optimizing mobile offloading frameworks, most previous works are evaluated in a single-tenant setting, that is, a server is assigned to a single client. However, in a practical scenario that servers must handle tasks from many clients running diverse applications, contention on shared server resources may degrade application performance. In this work, we study scheduling techniques to improve serving performance in multi-tenant mobile offloading systems, for computer vision algorithms running on CPUs and deep neural networks (DNNs) running on GPUs. For CPU workloads, we present methods to mitigate resource contention and to improve delay using a Plan-Schedule approach. The planning phase predicts future workloads from all clients, estimates contention, and adjusts future task start times to remove or reduce contention. The scheduling phase dispatches arriving offloaded tasks to the server that minimizes contention. For DNN workloads running on GPUs, we propose adaptive batching algorithms using information of batch size, model complexity and system load to achieve the best Quality of Service (QoS), which are measured from accuracy and delay of DNN tasks. We demonstrate the improvement of serving performance using several real-world applications with different server deployments.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要