Scale Decoupled Distillation.

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)（2024）

引用 0|浏览13

暂无评分

摘要

Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it of-ten suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing log it-based methods may be sub-optimal since they only leverage the global logit output that couples multiple se-mantic knowledge. This may transfer ambiguous knowl-edge to the student and mislead its learning. To this end, we propose a simple but effective method, i.e., Scale De-coupled Distillation (SDD), for logit knowledge distillation. SDD decouples the global logit output into multi-ple local logit outputs and establishes distillation pipelines for them. This helps the student to mine and inherit fine-grained and unambiguous logit knowledge. Moreover, the decoupled knowledge can be further divided into consis-tent and complementary logit knowledge that transfers the semantic information and sample ambiguity, respectively. By increasing the weight of complementary parts, SDD can guide the student to focus more on ambiguous samples, im-proving its discrimination ability. Extensive experiments on several benchmark datasets demonstrate the effective-ness of SDD for wide teacher-student pairs, especially in the fine-grained classification task. Code is available at: https://github.comishicaiwei123/SDD-CVPR2024

查看译文

关键词

Knowlegde Distillation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要