Imitating the Human Visual System for Scanpath Predicting

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览5
暂无评分
摘要
Scanpath refers to the trajectory of eye fixations when humans perform visual reasoning. Most existing methods mainly focus on predicting static attention maps, which represent the probability that each pixel in the image is paid attention to by humans. However, human gaze behavior is purposeful and dynamic, especially in the search for specific objects. Inspired by eye-movement mechanism of human vision system, a reinforcement learning method is introduced to imitate the human visual system to predict scanpath in target search. This paper also considers periphery-fovea vision and incorporates eye-movement behavior to improve the accuracy of scanpath prediction. Besides, the Contrastive Language-Image Pretraining (CLIP) text encoder is employed as the task embedding to convert target objects into vectors. Compared with the state-of-the-art (SOTA) models on COCO-Search18 dataset, our proposed method achieves comprehensively superior performance on fixations location and duration prediction.
更多
查看译文
关键词
scanpath,reinforcement learning,eye movement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要