Spatiotemporal contrastive modeling for video moment retrieval

Yi Wang,Kun Li, Guoliang Chen,Yan Zhang,Dan Guo,Meng Wang

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS(2022)

引用 0|浏览11
暂无评分
摘要
With the rapid development of social networks, video data has been growing explosively. As one of the important social mediums, spatiotemporal characteristics of videos have attracted considerable attention in recommendation system and video understanding. In this paper, we discuss the video moment retrieval (VMR) task, which locates moments in a video based on different textual queries. Existing methods are of two pipelines: 1) proposal-free approaches are mainly in modifying multi-modal interaction strategy; 2) proposal-based methods are dedicated to designing advanced proposal generation paradigm. Recently, contrastive representation learning has been successfully applied to the field of video understanding. From a new perspective, we propose a new VMR framework, named spatiotemporal contrastive network (STCNet), to learn discriminative boundary features of video grounding by contrast learning. To be specific, we propose a boundary matching sampling module for dense negative sample sampling. The contrast learning can refine the feature representations in the training phase without any additional cost in inference. On three public datasets, Charades-STA, ActivityNet Captions and TACoS, our proposed method performs competitive performance.
更多
查看译文
关键词
Video moment retrieval, Spatiotemporal modeling, Contrastive learning, Language query, Temporal localization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要