Trained Teacher: Who is Good at Teaching

Xingzhu Liang, Feilong Bi, Wen Liu,Xinyun Yan,Chunjiong Zhang,Chenxing Xia

Displays（2023）

引用 0|浏览5

暂无评分

摘要

Knowledge distillation is an emerging method for acquiring efficient, small-scale networks. The main idea is to transfer knowledge from a complex teacher model with high learning capacity to a simple student model. To this end, various approaches to knowledge distillation have been proposed in the past few years, focusing mainly on modifications to student learning styles and less on changes to teacher teaching styles. Therefore, our new approach to knowledge distillation teacher training involves adapting the trained teachers to the knowledge distillation model in order to minimize the gap between the student model and the teacher model. We introduced the idea of a "Trained Teacher": Our approach involves using a specially trained teacher network that, by incorporating knowledge distillation constraints during its own training, adapts to the teaching model in advance and performs nearly identically to a typical teacher network. This allows students to absorb the teacher's knowledge more effectively, thereby increasing their competence. In addition, the methods of mainstream knowledge distillation currently in use are equally appropriate to our educated teachers. Extensive tests on numerous datasets reveal that our technique outperforms the original knowledge distillation in accuracy on standard KD by 2%. Our code and pre-trained models can be found at https://github.com/JSJ515-Group/Traine d_teacher.

查看译文

关键词

Knowledge distillation,Trained teacher,Knowledge transfer,Teacher-student model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要