AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Throughout the comprehensive benchmarking of image-based monocular gaze estimation methods, our study clearly revealed the potential and remaining technical challenges of appearance-based gaze estimation

Appearance-Based Gaze Estimation In The Wild

2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), no. 1 (2015): 4511-4520

Cited: 563|Views28
EI

Abstract

Appearance-based gaze estimation is believed to work well in real-world settings, but existing datasets have been collected under controlled laboratory conditions and methods have been not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 2...More

Code:

Data:

0
Introduction
  • Appearance-based gaze estimation is well established as a research topic in computer vision because of its relevance for several application domains, including gaze-based human-computer interaction and visual behaviour analysis [31].
  • Learning-based methods were recently proposed to learn generic gaze estimators from large amounts of person, and head pose-independent training data [10, 34, 39].
  • Such methods have the potential to bring appearancebased methods into settings that do not require any user- or device-specific training.
  • Gaze angle vector g collected under controlled laboratory conditions.
  • These conditions are characterised by limited variability of eye appearances as well as the assumption of accurate head pose estimates.
  • Current appearance-based gaze estimation methods are not evaluated across different datasets, which bears the risk of significant dataset bias – a key problem in object recognition [43] and salient object detection [23]
Highlights
  • Appearance-based gaze estimation is well established as a research topic in computer vision because of its relevance for several application domains, including gaze-based human-computer interaction and visual behaviour analysis [31]
  • Current appearance-based gaze estimation methods are not evaluated across different datasets, which bears the risk of significant dataset bias – a key problem in object recognition [43] and salient object detection [23]
  • To further discuss the performance limits of the CNNbased approach, we show more detailed comparisons between Random Forests (RF) and convolutional neural networks (CNN) models
  • For UT Multiview, 500 test samples were randomly selected for each person from the above subset, and the other 2,500 samples were used as training data. These results further show the potential performance of the appearance-based estimator, and clearly depict the performance gap to be investigated
  • We presented the first extensive study on appearance-based gaze estimation in the unconstrained daily-life setting
  • Throughout the comprehensive benchmarking of image-based monocular gaze estimation methods, our study clearly revealed the potential and remaining technical challenges of appearance-based gaze estimation
Methods
  • The authors first employ state-of-the-art face detection and facial landmark detection methods to locate landmarks in the input image obtained from the calibrated monocular RGB camera.
  • The authors fit a generic 3D facial shape model to estimate 3D poses of the detected faces and apply the space normalisation technique proposed in [39] to crop and warp the head pose and eye images to the normalised training space.
  • The CNN is used to learn the mapping from the head poses and eye images to gaze directions in the camera coordinate system.
  • The authors' method first detects the user’s face in the image using Li et al.’s SURF cascade method [22].
  • Afterwards, the authors use Baltrusaitis et al.’s constrained local mode framework to detect facial landmarks [2]
Results
  • The authors discard all images in which the detector fails to find any face, which happened in about 5% of all cases.
  • The authors' CNN-based approach shows the best accuracy on both datasets (13.9 degrees on MPIIGaze, 10.5 degrees on Eyediap), with a significant performance gain (10% on MPIIGaze, 12% on Eyediap, paired Wilcoxon test [47], p < 0.05) over the state-of-the-art RF method
Conclusion
  • Despite a large body of previous work on the topic, appearance-based gaze estimation methods have so far been evaluated exclusively under controlled laboratory conditions.
  • The authors presented the first extensive study on appearance-based gaze estimation in the unconstrained daily-life setting.
  • The authors built a novel in-the-wild gaze dataset through a long-term data collection using laptops, which shows significantly larger variations in eye appearance than existing datasets.
  • Throughout the comprehensive benchmarking of image-based monocular gaze estimation methods, the study clearly revealed the potential and remaining technical challenges of appearance-based gaze estimation.
  • The authors' CNN-based estimation model significantly outperforms state-of-the-art methods in the most challenging person- and pose-independent training scenario.
  • This work and the dataset provide a critical insight on addressing grand challenges in daily-life gaze interaction
Tables
  • Table1: Comparison of current publicly available appearance-based gaze estimation datasets with respect to number of participants, head poses and on-screen gaze targets (discrete or continuous), number of different illumination conditions, average duration of data collection per participant, and total number of images
Download tables as Excel
Related work
  • 2.1. Gaze Estimation Methods

    Gaze estimation methods can be model-based or appearance-based [12]. Model-based methods use a geometric eye model and can be further divided into cornealreflection and shape-based methods, depending on whether they require external light sources to detect eye features. Early works on corneal reflection-based methods focused on stationary settings [36, 30, 13, 51] and were later extended to handle arbitrary head poses using multiple light sources or cameras [52, 53]. In contrast, shape-based methods [16, 4, 50, 44] directly infer gaze directions from observed eye shapes, such as pupil centre or iris edges. Although they have recently been applied to more practical application scenarios [18, 11, 41, 49], their accuracy is lower and it is unclear whether shape-based approaches can robustly handle low image quality and variable lighting conditions. Appearance-based gaze estimation methods directly use eye images as input and can therefore potentially work with low-resolution eye images. While early works assumed a fixed head pose [3, 42, 48, 35, 27, 24], recent works focused on methods for 3D head pose estimation [25, 26, 9, 6]. However, appearance-based methods require larger amounts of user-specific training data than model-based methods, and it remains unclear if the learned estimator can generalise to unknown users. Similarly, previous methods typically assumed accurate 3D head poses as input, which is a strong assumption for unconstrained inthe-wild settings.
Funding
  • This work was funded in part by the Cluster of Excellence on Multimodal Computing and Interaction (MMCI) at Saarland University as well as an Alexander von Humboldt Research Fellowship
Study subjects and analysis
participants: 15
In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing ones with respect to appearance and illumination

current datasets: 3
We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks that significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation. We present an extensive evaluation of several state-of-the-art imagebased gaze estimation algorithms on three current datasets, including our own. This evaluation provides clear insights and allows us to identify key research challenges of gaze estimation in the wild

laptop users: 15
In this work we make the first step towards appearancebased gaze estimation in the wild. Given a lack of realistic data, we created the MPIIGaze dataset that contains 213,659 images collected from 15 laptop users over several months (see Figure 2). MPIIGaze covers a realistic variability in appearance and illumination and therefore represents a significant advance over existing datasets

current datasets: 3
Our dataset is one order of magnitude larger than existing datasets and significantly more variable with respect to illumination and appearance. Second, we present an extensive evaluation of state-of-the-art gaze estimation algorithms on three current datasets, including our own, and identify key research challenges of in-the-wild settings. Third, we present a method for appearance-based gaze estimation that uses multimodal convolutional neural networks and that significantly outperforms state-of-the-art methods in the most challenging cross-dataset evaluation

participants: 16
Because most existing gaze estimation datasets are designed for coarse gaze estimation, the sampling density of gaze and head pose space is not sufficient to train appearance-based gaze estimators [29, 45, 46, 37] (see Table 1 for an overview of existing datasets). More comparable to MPIIGaze, the Eyediap dataset contains 94 video sequences of 16 participants looking at three different targets (discrete and continuous markers displayed on a monitor, and floating physical targets) under both static and free head motion [8]. The UT Multiview dataset also contains dense gaze samples of 50 participants as well as 3D reconstructions of eye regions that can be used to synthesise images for arbitrary head poses [39]

participants: 50
More comparable to MPIIGaze, the Eyediap dataset contains 94 video sequences of 16 participants looking at three different targets (discrete and continuous markers displayed on a monitor, and floating physical targets) under both static and free head motion [8]. The UT Multiview dataset also contains dense gaze samples of 50 participants as well as 3D reconstructions of eye regions that can be used to synthesise images for arbitrary head poses [39]. However, as discussed before, both datasets have the significant limitation that they were recorded under controlled laboratory settings

participants: 15
Dataset Characteristics. We collected a total of 213,659 images from 15 participants. The number of images collected by each participant varied from 34,745 to 1,498

training persons: 5
Adaptive Linear Regression (ALR) Because it was originally designed for a person-specific and sparse set of training samples [27], ALR does not scale to large datasets. We therefore use the same approximation as in [10], i.e. we select five training persons for each test person by evaluating the interpolation weights. We further select random subsets of samples from the test sample’s neighbours in head pose space

samples: 3000
However, these two cases are expected to have different difficulty levels. To investigate the difference within these results in more detail, we further show a three fold evaluation using a subset (3,000 samples per person) of the UT Multiview dataset selected so as to have the same pose and gaze angle distributions as the MPIIGaze dataset. The result is shown in the next part of Figure 10, and the performance gap compared to Figure 8 indicates the error that arises from the in-the-wild setting, including appearance variations and eye alignment errors

test samples: 500
For MPIIGaze, the last quarter of the data from each person was used as test data, and the rest of the data was used as training data. For UT Multiview, 500 test samples were randomly selected for each person from the above subset, and the other 2,500 samples were used as training data. These results further show the potential performance of the appearance-based estimator, and clearly depict the performance gap to be investigated

Reference
  • F. Alnajar, T. Gevers, R. Valenti, and S. Ghebreab. Calibration-free gaze estimation using human gaze patterns. In Proc. ICCV, 2013. 2
    Google ScholarLocate open access versionFindings
  • T. Baltrusaitis, P. Robinson, and L.-P. Morency. Continuous conditional neural fields for structured regression. In Proc. ECCV, pages 593–608, 2014. 5
    Google ScholarLocate open access versionFindings
  • S. Baluja and D. Pomerleau. Non-intrusive gaze tracking using artificial neural networks. Technical report, DTIC Document, 1994. 2
    Google ScholarFindings
  • J. Chen and Q. Ji. 3d gaze estimation with a single camera without ir illumination. In Proc. ICPR, pages 1–4, 2008. 2
    Google ScholarLocate open access versionFindings
  • J. Chen and Q. Ji. Probabilistic gaze estimation without active personal calibration. In Proc. CVPR, pages 609–616, 2011. 2
    Google ScholarLocate open access versionFindings
  • J. Choi, B. Ahn, J. Parl, and I. S. Kweon. Appearance-based gaze estimation using kinect. In Proc. URAI, pages 260–261, 2013. 2
    Google ScholarLocate open access versionFindings
  • R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871–1874, 2008. 6
    Google ScholarLocate open access versionFindings
  • K. A. Funes Mora, F. Monay, and J.-M. Odobez. Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras. In Proc. ETRA, pages 255–258, 2014. 2, 3, 4, 6
    Google ScholarLocate open access versionFindings
  • K. A. Funes Mora and J.-M. Odobez. Gaze estimation from multimodal kinect data. In Proc. CVPRW, pages 25–30, 2012. 2
    Google ScholarLocate open access versionFindings
  • K. A. Funes Mora and J.-M. Odobez. Person independent 3d gaze estimation from remote rgb-d cameras. In Proc. ICIP, 2013. 1, 2, 3, 6
    Google ScholarLocate open access versionFindings
  • K. A. Funes Mora and J.-M. Odobez. Geometric generative gaze estimation (g3e) for remote rgb-d cameras. In Proc. CVPR, pages 1773–1780, 2014. 2
    Google ScholarLocate open access versionFindings
  • D. W. Hansen and Q. Ji. In the eye of the beholder: A survey of models for eyes and gaze. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3):478–500, 2010. 2
    Google ScholarLocate open access versionFindings
  • C. Hennessey, B. Noureddin, and P. Lawrence. A single camera eye-gaze tracking system with free head motion. In Proc. ETRA, pages 87–94, 2006. 2
    Google ScholarLocate open access versionFindings
  • G. B. Huang, M. Mattar, T. Berg, E. Learned-Miller, et al. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in’Real-Life’Images: Detection, Alignment, and Recognition, 2008. 3
    Google ScholarLocate open access versionFindings
  • M. X. Huang, T. C. Kwok, G. Ngai, H. V. Leong, and S. C. Chan. Building a self-learning eye gaze model from user interaction data. In Proc. MM, pages 1017–1020, 2014. 2
    Google ScholarLocate open access versionFindings
  • T. Ishikawa, S. Baker, I. Matthews, and T. Kanade. Passive driver gaze tracking with active appearance models. In Proc. 11th World Congress on Intelligent Transportation Systems, 2004. 2
    Google ScholarLocate open access versionFindings
  • Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. 5
    Findings
  • L. Jianfeng and L. Shigang. Eye-model-based gaze estimation by rgb-d camera. In Proc. CVPRW, pages 606–610, 2014. 2
    Google ScholarLocate open access versionFindings
  • R. Larson and M. Csikszentmihalyi. The experience sampling method. New Directions for Methodology of Social & Behavioral Science, 1983. 3
    Google ScholarLocate open access versionFindings
  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. 5
    Google ScholarLocate open access versionFindings
  • V. Lepetit, F. Moreno-Noguer, and P. Fua. EPnP: An accurate o(n) solution to the PnP problem. International Journal of Computer Vision, 81(2):155–166, 2009. 5
    Google ScholarLocate open access versionFindings
  • J. Li and Y. Zhang. Learning surf cascade for fast and accurate object detection. In Proc. CVPR, pages 3468–3475, 2013. 5
    Google ScholarLocate open access versionFindings
  • Y. Li, X. Hou, C. Koch, J. Rehg, and A. Yuille. The secrets of salient object segmentation. In Proc. CVPR, 2014. 1
    Google ScholarLocate open access versionFindings
  • K. Liang, Y. Chahir, M. Molina, C. Tijus, and F. Jouen. Appearance-based gaze tracking with spectral clustering and semi-supervised gaussian process regression. In Proc. ETSA, pages 17–23, 2013. 2
    Google ScholarLocate open access versionFindings
  • F. Lu, T. Okabe, Y. Sugano, and Y. Sato. Learning gaze biases with head motion for head pose-free gaze estimation. Image and Vision Computing, 32(3):169 – 179, 2014. 2
    Google ScholarLocate open access versionFindings
  • F. Lu, Y. Sugano, T. Okabe, and Y. Sato. Head pose-free appearance-based gaze sensing via eye image synthesis. In Proc. ICPR, pages 1008–1011, 2012. 2
    Google ScholarLocate open access versionFindings
  • F. Lu, Y. Sugano, T. Okabe, and Y. Sato. Adaptive linear regression for appearance-based gaze estimation. IEEE Trans. PAMI, 36(10):2033–2046, Oct 2014. 2, 6
    Google ScholarLocate open access versionFindings
  • P. Majaranta and A. Bulling. Eye tracking and eye-based human–computer interaction. In Advances in Physiological Computing, pages 39–65. Springer, 2014. 1, 3
    Google ScholarLocate open access versionFindings
  • C. D. McMurrough, V. Metsis, J. Rich, and F. Makedon. An eye tracking dataset for point of gaze detection. In Proc. ETRA, pages 305–308, 2012. 3
    Google ScholarLocate open access versionFindings
  • C. H. Morimoto, A. Amir, and M. Flickner. Detecting eye position and gaze from a single camera and 2 light sources. In Proc. ICPR, pages 314–317, 2002. 2
    Google ScholarLocate open access versionFindings
  • C. H. Morimoto and M. R. Mimica. Eye gaze tracking techniques for interactive applications. Computer Vision and Image Understanding, 98(1):4–24, 2005. 1
    Google ScholarLocate open access versionFindings
  • J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Ng. Multimodal deep learning. In Proc. ICML, pages 689–696, 2011. 5
    Google ScholarLocate open access versionFindings
  • R. Rodrigues, J. a. Barreto, and U. Nunes. Camera pose estimation using images of planar mirror reflections. In Proc. ECCV, pages 382–395, 2010. 3
    Google ScholarLocate open access versionFindings
  • T. Schneider, B. Schauerte, and R. Stiefelhagen. Manifold alignment for person independent appearance-based gaze estimation. In Proc. ICPR, 2014. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • W. Sewell and O. Komogortsev. Real-time eye gaze tracking with an unmodified commodity webcam employing a neural network. In Ext. Abstracts CHI, pages 3739–3744, 2010. 2
    Google ScholarLocate open access versionFindings
  • S.-W. Shih and J. Liu. A novel approach to 3-d gaze tracking using stereo cameras. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 34(1):234–245, 2004. 2
    Google ScholarLocate open access versionFindings
  • B. A. Smith, Q. Yin, S. K. Feiner, and S. K. Nayar. Gaze locking: passive eye contact detection for human-object interaction. In Proc. UIST, pages 271–280, 2013. 2, 3
    Google ScholarLocate open access versionFindings
  • Y. Sugano, Y. Matsushita, and Y. Sato. Appearance-based gaze estimation using visual saliency. IEEE Trans. on PAMI, 35(2):329–341, Feb 2013. 2
    Google ScholarLocate open access versionFindings
  • Y. Sugano, Y. Matsushita, and Y. Sato. Learning-bysynthesis for appearance-based 3d gaze estimation. In Proc. CVPR, pages 1821–1828, 2014. 1, 2, 3, 4, 5, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • Y. Sugano, Y. Matsushita, Y. Sato, and H. Koike. An incremental learning method for unconstrained gaze estimation. In Proc. ECCV, pages 656–667, 2008. 2
    Google ScholarLocate open access versionFindings
  • L. Sun, M. Song, Z. Liu, and M.-T. Sun. Realtime gaze estimation with online calibration. In Proc. ICME, pages 1–6, 2014. 2
    Google ScholarLocate open access versionFindings
  • K.-H. Tan, D. J. Kriegman, and N. Ahuja. Appearance-based eye gaze estimation. In Proc. WACV, pages 191–195, 2002. 2
    Google ScholarLocate open access versionFindings
  • A. Torralba and A. A. Efros. Unbiased look at dataset bias. In Proc. CVPR, pages 1521–1528. IEEE, 2011. 1
    Google ScholarLocate open access versionFindings
  • R. Valenti, N. Sebe, and T. Gevers. Combining head pose and eye location information for gaze estimation. IEEE Transactions on Image Processing, 21(2):802–815, 2012. 2
    Google ScholarLocate open access versionFindings
  • A. Villanueva, V. Ponz, L. Sesma-Sanchez, M. Ariz, S. Porta, and R. Cabeza. Hybrid method based on topography for robust detection of iris center and eye corners. ACM Transactions on Multimedia Computing, Communications, and Applications, 9(4):25, 2013. 3
    Google ScholarLocate open access versionFindings
  • U. Weidenbacher, G. Layher, P.-M. Strauss, and H. Neumann. A comprehensive head pose and gaze database. In Proc. IET, pages 455–458, 2007. 3
    Google ScholarLocate open access versionFindings
  • F. Wilcoxon. Individual comparisons by ranking methods. Biometrics bulletin, pages 80–83, 1945. 7
    Google ScholarLocate open access versionFindings
  • O. Williams, A. Blake, and R. Cipolla. Sparse and semisupervised visual mapping with the S 3GP. In Proc. CVPR, pages 230–237, 2006. 2
    Google ScholarLocate open access versionFindings
  • E. Wood and A. Bulling. Eyetab: Model-based gaze estimation on unmodified tablet computers. In Proc. ETRA, pages 207–210, 2014. 2, 6, 7
    Google ScholarLocate open access versionFindings
  • H. Yamazoe, A. Utsumi, T. Yonezawa, and S. Abe. Remote gaze estimation with a single camera based on facial-feature tracking without special calibration actions. In Proc. ETRA, pages 245–250, 2008. 2
    Google ScholarLocate open access versionFindings
  • D. H. Yoo and M. J. Chung. A novel non-intrusive eye gaze estimation using cross-ratio under large head motion. Computer Vision and Image Understanding, 98(1):25–51, 2005. 2
    Google ScholarLocate open access versionFindings
  • Z. Zhu and Q. Ji. Eye gaze tracking under natural head movements. In Proc. CVPR, pages 918–923, 2005. 2
    Google ScholarLocate open access versionFindings
  • Z. Zhu, Q. Ji, and K. P. Bennett. Nonlinear eye gaze mapping function estimation via support vector regression. In Proc. ICPR, pages 1132–1135, 2006. 2
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn