Selective arguments representation with dual relation-aware network for video situation recognition

Wei Liu, Qing He,Chao Wang, Yan Peng,Shaorong Xie

Neural Computing and Applications(2024)

引用 0|浏览0
暂无评分
摘要
Argument visual states are helpful for detecting structured components of events in videos, and existing methods tend to use object detectors to generate their candidates. However, directly leveraging object features captured by bounding boxes overlooks a deep understanding of object relations and differences between them and real arguments. In this work, we propose a novel framework to generate selective contextual representations of videos, thereby reducing the interference of useless or incorrect object features. Firstly, we construct grid-based object features as graphs based on the internal grid connection and then use graph convolutional network to execute feature aggregation. Secondly, a weighted geometric attention module is designed to obtain the contextual representation of objects, which explicitly combines visual similarity and geometric correlation with different importance proportions. Then, we propose a dual relation-aware selection module for further feature selection. Finally, we utilize labels as the ladder to bridge the gap between object features and semantic roles, while considering the proximity in the semantic space. Experimental results and extensive ablation studies on the VidSitu indicate that our method effectively obtains a deep understanding of events in videos and outperforms state-of-the-art models.
更多
查看译文
关键词
Attention mechanism,Argument visual states,Graph convolutional network,Geometric correlation,Semantic space
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要