Knowledge distillation based on projector integration and classifier sharing

Complex & Intelligent Systems(2024)

引用 0|浏览7
暂无评分
摘要
Knowledge distillation can transfer the knowledge from the pre-trained teacher model to the student model, thus effectively accomplishing model compression. Previous studies have carefully crafted knowledge representation, targeting loss function design, and distillation location selection, but there have been few studies on the role of classifiers in distillation. Previous experiences have shown that the final classifier of the model has an essential role in making inferences, so this paper attempts to narrow the gap in performance between models by having the student model directly use the classifier of the teacher model for the final inference, which requires an additional projector to help match features of the student encoder with the teacher's classifier. However, a single projector cannot fully align the features, and integrating multiple projectors may result in better performance. Considering the balance between projector size and performance, through experiments, we obtain the size of projectors for different network combinations and propose a simple method for projector integration. In this way, the student model undergoes feature projection and then uses the classifiers of the teacher model for inference, obtaining a similar performance to the teacher model. Through extensive experiments on the CIFAR-100 and Tiny-ImageNet datasets, we show that our approach applies to various teacher–student frameworks simply and effectively.
更多
查看译文
关键词
Deep neural network,Model compression,Knowledge distillation,Features projection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要