A Survey on Transformer Compression
arxiv(2024)
摘要
Large models based on the Transformer architecture play increasingly vital
roles in artificial intelligence, particularly within the realms of natural
language processing (NLP) and computer vision (CV). Model compression methods
reduce their memory and computational cost, which is a necessary step to
implement the transformer models on practical devices. Given the unique
architecture of transformer, featuring alternative attention and Feedforward
Neural Network (FFN) modules, specific compression techniques are required. The
efficiency of these compression methods is also paramount, as it is usually
impractical to retrain large models on the entire training dataset.This survey
provides a comprehensive review of recent compression methods, with a specific
focus on their application to transformer models. The compression methods are
primarily categorized into pruning, quantization, knowledge distillation, and
efficient architecture design. In each category, we discuss compression methods
for both CV and NLP tasks, highlighting common underlying principles. At last,
we delve into the relation between various compression methods, and discuss the
further directions in this domain.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要