Deep compressive offloading: speeding up neural network inference by trading edge computation for network latency

SenSys '20: The 18th ACM Conference on Embedded Networked Sensor Systems Virtual Event Japan November, 2020(2020)

引用 115|浏览224
暂无评分
摘要
With recent advances, neural networks have become a crucial building block in intelligent IoT systems and sensing applications. However, the excessive computational demand remains a serious impediment to their deployments on low-end IoT devices. With the emergence of edge computing, offloading grows into a promising technique to circumvent end-device limitations. However, transferring data between local and edge devices takes up a large proportion of time in existing offloading frameworks, creating a bottleneck for low-latency intelligent services. In this work, we propose a general framework, called deep compressive offloading. By integrating compressive sensing theory and deep learning, our framework can encode data for offloading into tiny sizes with negligible overhead on local devices and decode the data on the edge server, while offering theoretical guarantees on perfect reconstruction and lossless inference. By trading edge computing resources for data transmission time, our design can significantly reduce offloading latency with almost no accuracy loss. We build a deep compressive offloading system to serve state-of-the-art computer vision and speech recognition services. With comprehensive evaluations, our system can consistently reduce end-to-end latency by 2X to 4X with 1% accuracy loss, compared to state-of-the-art neural network offloading systems. In conditions of limited network bandwidth or intensive background traffic, our system can further speed up the neural network inference by up to 35X 1.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要