A Lightweight Transformer Model using Neural ODE for FPGAs

IPDPS Workshops(2023)

引用 0|浏览4
暂无评分
摘要
A transformer is an emerging neural network model that employs an attention mechanism. It has been adopted to various tasks and achieved a favorable accuracy compared to CNNs (Convolutional Neural Networks) and RNNs (Recurrent Neural Networks). Although the attention mechanism is recognized as a general-purpose component, many of the transformer models require a significant number of parameters and thus they are not suited to low-cost edge devices. Recently, a resource-efficient hybrid model that uses ResNet as a backbone architecture and replaces a part of its convolutional layers with an MHSA (Multi-Head Self-Attention) mechanism was proposed. In this paper, we significantly reduce the parameter size of this approach by using Neural ODE as a backbone architecture for the MHSA mechanism. The proposed hybrid model reduces the parameter size by 97.3% compared to the original model without degrading the accuracy. Since the model size is quite small, it is implemented on Xilinx ZCU104 FPGA (Field Programmable Gate Array) board so that it can fully exploit on-chip BRAM/URAM resources. The FPGA implementation is evaluated in terms of resource utilization, accuracy, performance, and power consumption. The results demonstrate that it speeds up the model by up to 2.63 times compared to a software execution without accuracy degradation.
更多
查看译文
关键词
CNN,Transformer,Multi Head Self Attention,Zynq,Neural ODE,BoTNet
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要