Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
arxiv(2024)
摘要
Motivated by the huge success of Transformers in the field of natural
language processing (NLP), Vision Transformers (ViTs) have been rapidly
developed and achieved remarkable performance in various computer vision tasks.
However, their huge model sizes and intensive computations hinder ViTs'
deployment on embedded devices, calling for effective model compression
methods, such as quantization. Unfortunately, due to the existence of
hardware-unfriendly and quantization-sensitive non-linear operations,
particularly Softmax, it is non-trivial to completely quantize all operations
in ViTs, yielding either significant accuracy drops or non-negligible hardware
costs. In response to challenges associated with standard ViTs, we
focus our attention towards the quantization and acceleration for
efficient ViTs, which not only eliminate the troublesome Softmax but
also integrate linear attention with low computational complexity, and propose
Trio-ViT accordingly. Specifically, at the algorithm level, we develop a
tailored post-training quantization engine taking the unique activation
distributions of Softmax-free efficient ViTs into full consideration, aiming to
boost quantization accuracy. Furthermore, at the hardware level, we build an
accelerator dedicated to the specific Convolution-Transformer hybrid
architecture of efficient ViTs, thereby enhancing hardware efficiency.
Extensive experimental results consistently prove the effectiveness of our
Trio-ViT framework. Particularly, we can gain up to
↑7.2× and ↑14.6× FPS under
comparable accuracy over state-of-the-art ViT accelerators, as well as
↑5.9× and ↑2.0× DSP
efficiency. Codes will be released publicly upon acceptance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要