Efficient Pruning of Large Language Model with Adaptive Estimation Fusion
CoRR(2024)
摘要
Large language models (LLMs) have become crucial for many generative
downstream tasks, leading to an inevitable trend and significant challenge to
deploy them efficiently on resource-constrained devices. Structured pruning is
a widely used method to address this challenge. However, when dealing with the
complex structure of the multiple decoder layers, general methods often employ
common estimation approaches for pruning. These approaches lead to a decline in
accuracy for specific downstream tasks. In this paper, we introduce a simple
yet efficient method that adaptively models the importance of each
substructure. Meanwhile, it can adaptively fuse coarse-grained and finegrained
estimations based on the results from complex and multilayer structures. All
aspects of our design seamlessly integrate into the endto-end pruning
framework. Our experimental results, compared with state-of-the-art methods on
mainstream datasets, demonstrate average accuracy improvements of 1.1
2.0
respectively.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要