Triplet Spatiotemporal Aggregation Network for Video Saliency Detection

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME(2023)

引用 0|浏览1
暂无评分
摘要
The effective aggregation of spatiotemporal information to accommodate real-world complex scenes is a fundamental issue in video saliency detection. In this paper, we propose a Triplet Spatiotemporal Aggregation Network (TSAN) to address it from the aggregation of spatiotemporal interaction, spatiotemporal information distribution, and multi-level spatiotemporal features. Firstly, we propose an interactive aggregation gate (IAG) module to model spatial and temporal global context information and perform inter-modal information transfer. Secondly, we employ an information distribution consistency (IDC) module to enhance the consistency of spatiotemporal representation by maximizing the correlation of spatiotemporal high-level features. Finally, we design a multi-level spatiotemporal feature aggregation (MSF) framework to merge cross-level and cross-modal features. These three modules are combined into a unified framework to jointly optimize spatiotemporal information for more precise results. Experimental results on five prevailing datasets show that TSAN outperforms previous competitors.
更多
查看译文
关键词
video saliency detection,spatiotemporal aggregation,spatiotemporal interaction,information distribution,multi-level feature aggregation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要