Post-Training Quantization for Vision Transformer in Transformed Domain.
IEEE International Conference on Multimedia and Expo(2023)
Abstract
As a successor to convolutional neural networks (CNNs), transformer-based models have achieved great performance in computer vision tasks. Compressing vision transformers to low-bit brings a number of practical benefits, including higher inference speed, improved memory footprint, and reduced energy consumption. Existing model compression methods, especially quantization techniques, ignore the joint statistics of weights, resulting in sub-optimal task performance at a given quantization bit rate. In this paper, we propose to apply a transform before quantization to decorrelate vision transformer's weights. And the entire compression flow is optimized in a rate-distortion framework to minimize the network output errors instead of simply optimizing for quantization errors or layer-wise output errors. Extensive experimental results on a variety of vision transformers (e.g. Swin, ViT and DeiT) demonstrate that our proposed method outperforms the state-of-the-art. It can quantize vision transformers (e.g. Swin, ViT and DeiT) on both weights and activations to 6-bit without a significant accuracy drop.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined