Deep Head Pose: Gaze-Direction Estimation in Multimodal Video
IEEE Trans. Multimedia(2015)
摘要
In this paper we present a convolutional neural network (CNN)-based model for human head pose estimation in low-resolution multi-modal RGB-D data. We pose the problem as one of classification of human gazing direction. We further fine-tune a regressor based on the learned deep classifier. Next we combine the two models (classification and regression) to estimate approximate regression confidence. We present state-of-the-art results in datasets that span the range of high-resolution human robot interaction (close up faces plus depth information) data to challenging low resolution outdoor surveillance data. We build upon our robust head-pose estimation and further introduce a new visual attention model to recover interaction with the environment . Using this probabilistic model, we show that many higher level scene understanding like human-human/scene interaction detection can be achieved. Our solution runs in real-time on commercial hardware.
更多查看译文
关键词
Head,Estimation,Human computer interaction,Surveillance,Magnetic heads,Visualization,Image resolution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络