Trained Teacher: Who is Good at Teaching
Displays(2023)
摘要
Knowledge distillation is an emerging method for acquiring efficient, small-scale networks. The main idea is to transfer knowledge from a complex teacher model with high learning capacity to a simple student model. To this end, various approaches to knowledge distillation have been proposed in the past few years, focusing mainly on modifications to student learning styles and less on changes to teacher teaching styles. Therefore, our new approach to knowledge distillation teacher training involves adapting the trained teachers to the knowledge distillation model in order to minimize the gap between the student model and the teacher model. We introduced the idea of a "Trained Teacher": Our approach involves using a specially trained teacher network that, by incorporating knowledge distillation constraints during its own training, adapts to the teaching model in advance and performs nearly identically to a typical teacher network. This allows students to absorb the teacher's knowledge more effectively, thereby increasing their competence. In addition, the methods of mainstream knowledge distillation currently in use are equally appropriate to our educated teachers. Extensive tests on numerous datasets reveal that our technique outperforms the original knowledge distillation in accuracy on standard KD by 2%. Our code and pre-trained models can be found at https://github.com/JSJ515-Group/Traine d_teacher.
更多查看译文
关键词
Knowledge distillation,Trained teacher,Knowledge transfer,Teacher-student model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要