谷歌浏览器插件
订阅小程序
在清言上使用

Social Data Assisted Multi-Modal Video Analysis For Saliency Detection

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING(2020)

引用 4|浏览63
暂无评分
摘要
Video saliency should be taken into consideration to facilitate optimization of the end-to-end video production, delivery and consumption ecosystem to improve user experience at lowered cost. Although recent studies have significantly increased the accuracy of saliency prediction, the approaches are mostly video-centric, without considering any prior "bias" that viewers may have with regard to the video contents. In this paper, we propose a novel learning-based multi-modal method for optimizing user-oriented video analysis. In particular, we generate a face-popularity mask using face recognition results and popularity information obtained from social media, and combine it with conventional content-only saliency analysis to produce multi-modal popularity-motion features. A convolutional long short-term memory (ConvL-STM) network discovers temporal correlation of human attention across frames. Experiments show that our method outperforms the state-of-the-art video saliency prediction approaches in representing human viewing preferences in real world applications, and demonstrate the necessity as well as the potential for integrating user bias information into attention detection.
更多
查看译文
关键词
Multi-modal analysis, video saliency, popularity, eye tracking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要