Social Data Assisted Multi-Modal Video Analysis For Saliency Detection

Jiangyue Xia,Jingqi Tian,Jiankai Xing,Jiawen Cheng,Jun Zhang,Jiangtao Wen,Zhengguang Li,Jian Lou

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING（2020）

引用 4|浏览63

暂无评分

摘要

Video saliency should be taken into consideration to facilitate optimization of the end-to-end video production, delivery and consumption ecosystem to improve user experience at lowered cost. Although recent studies have significantly increased the accuracy of saliency prediction, the approaches are mostly video-centric, without considering any prior "bias" that viewers may have with regard to the video contents. In this paper, we propose a novel learning-based multi-modal method for optimizing user-oriented video analysis. In particular, we generate a face-popularity mask using face recognition results and popularity information obtained from social media, and combine it with conventional content-only saliency analysis to produce multi-modal popularity-motion features. A convolutional long short-term memory (ConvL-STM) network discovers temporal correlation of human attention across frames. Experiments show that our method outperforms the state-of-the-art video saliency prediction approaches in representing human viewing preferences in real world applications, and demonstrate the necessity as well as the potential for integrating user bias information into attention detection.

查看译文

关键词

Multi-modal analysis, video saliency, popularity, eye tracking

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要