HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference.

Haoyu Wang,Yaqing Wang,Tianci Liu,Tuo Zhao,Jing Gao

EMNLP 2023（2023）

引用 0|浏览18

暂无评分

摘要

Pre-trained language models~(LMs) have brought remarkable performance on numerous NLP tasks. However, they require significant resources and entail high computational costs for inference, making them challenging to deploy in real-world and real-time systems. Existing early exiting methods aim to reduce computational complexity by selecting the layer at which to exit, but suffer from the limitation that they have to sequentially traverse through all layers prior to the selected exit layer, which lacks flexibility and degrades their performance. To solve this problem, we propose a \textbf{h}omotopic and \textbf{ad}aptive layer \textbf{skip}ping fine-tuning method named HadSkip. HadSkip adaptively selects the layers to skip based on a predefined budget. Specifically, we introduce a learnable gate before each layer of the LM to determine whether the current layer should be skipped. To tackle various challenges in training such as discrete gates and the budget constraint, we propose a fine-grained initialization strategy and homotopic optimization strategy. We conduct extensive experiments on the GLUE benchmark, and experimental results demonstrate the proposed HadSkip outperforms all state-of-the-art baselines significantly.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要