Robust semi-automatic head pose labeling for real-world face video sequences

Multimedia Tools and Applications(2013)

引用 29|浏览27
暂无评分
摘要
Automatic head pose estimation from real-world video sequences is of great interest to the computer vision community since pose provides prior knowledge for tasks, such as face detection and classification. However, developing pose estimation algorithms requires large, labeled real-world video databases on which computer vision systems can be trained and tested. Manual labeling of each frame is tedious, time consuming, and often difficult due to the high uncertainty in head pose angle estimate, particularly in unconstrained environments that include arbitrary facial expression, occlusion, illumination etc. To overcome these difficulties, a semi-automatic framework is proposed for labeling temporal head pose in real-world video sequences. The proposed multi-stage labeling framework first detects a subset of frames with distinct head poses over a video sequence, which is then manually labeled by the expert to obtain the ground truth for those frames. The proposed framework provides a continuous head pose label and corresponding confidence value over the pose angles. Next, the interpolation scheme over a video sequence estimates i) labels for the frames without manual labels and ii) corresponding confidence values for interpolated labels. This confidence value permits an automatic head pose estimation framework to determine the subset of frames to be used for further processing, depending on the labeling accuracy required. The experiments performed on an in-house, labeled, large, real-world face video database (which will be made publicly available) show that the proposed framework achieves 96.98 % labeling accuracy when manual labeling is only performed on 30 % of the video frames.
更多
查看译文
关键词
Semi-automatic labeling,Real-world video sequence,Head pose,Automatic face tracking,Bag-of-words,Manifold
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要