Post-Distillation via Neural Resuscitation

IEEE TRANSACTIONS ON MULTIMEDIA(2024)

引用 0|浏览5
暂无评分
摘要
Knowledge distillation, a widely adopted model compression technique, distils knowledge from a large teacher model to a smaller student model, with the goal of reducing the computational resources required for the student model. However, most existing distillation approaches focus on the types of knowledge and how to distil them, which neglect the student model's neuronal responses to the knowledge. In this article, we demonstrate that the kullback-leibler loss inhibits the neuronal responses in the opposite gradient direction, which injures the student model's potential during distilling. To address this problem, we introduce a principled dual-stage distillation scheme to rejuvenate all inhibited neurons at the neuronal level. In the first stage, we detect all the neurons in the student model during the standard distillation period and divide them into two parts according to their responses. In the second stage, we propose three strategies to resuscitate the neurons differently, which allows us to exploit the full potential of the student model. Through the experiments in various aspects of knowledge distillation, it is verified that the proposed approach outperforms the current state-of-the-art approaches. Our work provides a neuronal perspective for studying the response of the student model to the knowledge from the teacher model.
更多
查看译文
关键词
Neurons,Computational modeling,Standards,Optimization,Knowledge engineering,Task analysis,Probabilistic logic,Deep learning,knowledge distillation,model regularization,transfer learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要