A Unified FPGA Virtualization Framework for General-Purpose Deep Neural Networks in the Cloud

Shulin Zeng,Guohao Dai,Hanbo Sun,Jun Liu, Shiyao Li,Guangjun Ge,Kai Zhong,Kaiyuan Guo,Yu Wang,Huazhong Yang

ACM Transactions on Reconfigurable Technology and Systems（2022）

引用 4|浏览88

暂无评分

摘要

INFerence-as-a-Service (INFaaS) has become a primary workload in the cloud. However, existing FPGA-based Deep Neural Network (DNN) accelerators are mainly optimized for the fastest speed of a single task, while the multi-tenancy of INFaaS has not been explored yet. As the demand for INFaaS keeps growing, simply increasing the number of FPGA-based DNN accelerators is not cost-effective, while merely sharing these single-task optimized DNN accelerators in a time-division multiplexing way could lead to poor isolation and high-performance loss for INFaaS. On the other hand, current cloud-based DNN accelerators have excessive compilation overhead, especially when scaling out to multi-FPGA systems for multi-tenant sharing, leading to unacceptable compilation costs for both offline deployment and online reconfiguration. Therefore, it is far from providing efficient and flexible FPGA virtualization for public and private cloud scenarios. Aiming to solve these problems, we propose a unified virtualization framework for general-purpose deep neural networks in the cloud, enabling multi-tenant sharing for both the Convolution Neural Network (CNN), and the Recurrent Neural Network (RNN) accelerators on a single FPGA. The isolation is enabled by introducing a two-level instruction dispatch module and a multi-core based hardware resources pool. Such designs provide isolated and runtime-programmable hardware resources, which further leads to performance isolation for multi-tenant sharing. On the other hand, to overcome the heavy re-compilation overheads, a tilingbased instruction frame package design and a two-stage static-dynamic compilation, are proposed. Only the lightweight runtime information is re-compiled with similar to 1 ms overhead, thus guaranteeing the private cloud's performance. Finally, the extensive experimental results show that the proposed virtualized solutions achieve up to 3.12x and 6.18x higher throughput in the private cloud compared with the static CNN and RNN baseline designs, respectively.

查看译文

关键词

Virtualization,neural networks,cloud computing,FPGA

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要