Training Efficient Saliency Prediction Models with Knowledge Distillation

Proceedings of the 27th ACM International Conference on Multimedia(2019)

引用 18|浏览67
暂无评分
摘要
Recently, deep learning-based saliency prediction methods have achieved significant accuracy improvements. However, they are hard to embed in practical multimedia applications due to large memory consumption and running time caused by complicated architectures. In addition, most methods are fine-tuned from pre-trained models for classification tasks, and networks cannot flexibly be transferred for a new task. In this paper, a condensed and randomly initialized student network is employed to achieve higher efficiency by transferring knowledge from complicated and well-trained teacher networks. This is the first use of knowledge distillation for efficient pixel-wise saliency prediction. Instead of directly minimizing Euclidean distance between feature maps, we propose two statistical representations of feature maps (i.e., first-order and second-order statistics) as knowledge. We conduct experiments on three kinds of teacher networks and four benchmark datasets to verify the effectiveness of the proposed method. Compared with the teacher networks, the student networks achieve an acceleration ratio of 4.56-4.73. Compared with state-of-the-art approaches, the proposed model achieves competitive accuracy with faster running speed (up to 4.38 times) and smaller model size (up to 93.27% reduction). We further embedded the proposed saliency prediction model into a video captioning application. The saliency-embedded approaches improve video captioning on all test metrics with a small complexity cost. The student-model embedded approach achieves 25% time saving with similar performance to the teacher embedded one.
更多
查看译文
关键词
first-order statistics, knowledge transfer, saliency prediction, second-order statistics, video caption
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要