Using Spatial Temporal Graph Convolutional Network Dynamic Scene Graph for Video Captioning of Pedestrians Intention.

NLPIR(2020)

引用 1|浏览3
暂无评分
摘要
Video captioning helps to understand video content at a semantic level. The scene graph method can realize structured representation of image semantics. The scene graph method can provide effective support for video captioning. In the areas of autonomous driving and Advanced Driver Assistance Systems (ADAS), intelligent systems need to make cognitive inferences about pedestrian intentions and behaviors, and the relationships between surrounding objects. This will help the system make accurate decisions in real time. In this paper, we propose a video captioning of pedestrians intention, This algorithm constructs a novel dynamic scene graph based on spatial temporal graph convolution network. The previous scene graph method solved the problem of the semantic structured representation of the image, but did not make good use of the video time series information between front and back images of consecutive frames in video. Such defects cause the existing algorithms to not understand the dynamic behavior of traffic scenes well. Our new method collaboratively extracts visual semantic features from both spatial and temporal, which effectively improves the video reasoning ability. On the video captioning dataset for pedestrian scenes, Experimental results show that novel method can significantly improve the performance of trajectory prediction.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要