DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning

IEEE Transactions on Cloud Computing(2023)

引用 1|浏览20
暂无评分
摘要
Recent advances in deep neural networks have substantially improved the accuracy and speed of various intelligent applications. Nevertheless, one obstacle is that DNN inference imposes a heavy computation burden on end devices, but offloading inference tasks to the cloud causes a large volume of data transmission. Motivated by the fact that the data size of some intermediate DNN layers is significantly smaller than that of raw input data, we designed the DNN surgery, which allows partitioned DNN to be processed at both the edge and cloud while limiting the data transmission. The challenge is twofold: (1) Network dynamics substantially influence the performance of DNN partition, and (2) State-of-the-art DNNs are characterized by a directed acyclic graph rather than a chain, so that partition is incredibly complicated. To solve the issues, We design a Dynamic Adaptive DNN Surgery(DADS) scheme, which optimally partitions the DNN under different network conditions. We also study the partition problem under the cost-constrained system, where the resource of the cloud for inference is limited. Then, a real-world prototype based on the selif-driving car video dataset is implemented, showing that compared with current approaches, DNN surgery can improve latency up to 6.45 times and improve throughput up to 8.31 times. We further evaluate DNN surgery through two case studies where we use DNN surgery to support an indoor intrusion detection application and a campus traffic monitor application, and DNN surgery shows consistently high throughput and low latency.
更多
查看译文
关键词
dnn inference,dnn surgery,edge,layer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要