Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention
arxiv(2024)
摘要
Humans utilize their gaze to concentrate on essential information while
perceiving and interpreting intentions in videos. Incorporating human gaze into
computational algorithms can significantly enhance model performance in video
understanding tasks. In this work, we address a challenging and innovative task
in video understanding: predicting the actions of an agent in a video based on
a partial video. We introduce the Gaze-guided Action Anticipation algorithm,
which establishes a visual-semantic graph from the video input. Our method
utilizes a Graph Neural Network to recognize the agent's intention and predict
the action sequence to fulfill this intention. To assess the efficiency of our
approach, we collect a dataset containing household activities generated in the
VirtualHome environment, accompanied by human gaze data of viewing videos. Our
method outperforms state-of-the-art techniques, achieving a 7% improvement in
accuracy for 18-class intention recognition. This highlights the efficiency of
our method in learning important features from human gaze data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要