DeepPose: Human Pose Estimation via Deep Neural Networks

Computer Vision and Pattern Recognition, Volume abs/1312.4659, 2014, Pages 1653-1660.

Cited by: 1332|Bibtex|Views290|DOI:https://doi.org/10.1109/CVPR.2014.214
EI WOS
Other Links: dl.acm.org|dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
The first application of Deep Neural Networks to human pose estimation

Abstract:

We propose a method for human pose estimation based on Deep Neural Networks (DNNs). The pose estimation is formulated as a DNN-based regression problem towards body joints. We present a cascade of such DNN regres- sors which results in high precision pose estimates. The approach has the advantage of reasoning about pose in a holistic fash...More

Code:

Data:

0
Introduction
  • The problem of human pose estimation, defined as the problem of localization of human joints, has enjoyed substantial attention in the computer vision community.
  • The above efficiency, is achieved at the cost of limited expressiveness – the use of local detectors, which reason in many cases about a single part, and most importantly by modeling only a small subset of all interactions between body parts
  • These limitations, as exemplified in Fig. 1, have been recognized and methods reasoning about pose in a holistic manner have been proposed [15, 20] but with limited success in real-world problems
Highlights
  • The problem of human pose estimation, defined as the problem of localization of human joints, has enjoyed substantial attention in the computer vision community
  • We show that a generic convolutional Deep Neural Networks can be learned for this problem
  • To our knowledge, the first application of Deep Neural Networks (DNNs) to human pose estimation
  • Our formulation of the problem as Deep Neural Networks-based regression to joint coordinates and the presented cascade of such regressors has the advantage of capturing context and reasoning about pose in a holistic manner
  • We show that using a generic convolutional neural network, which was originally designed for classification tasks, can be applied to the different task of localization
  • We plan to investigate novel architectures which could be potentially better tailored towards localization problems in general, and in pose estimation in particular
Methods
  • Johnson et al − wrists.
  • Johnson et al − elbows Legs.
  • DeepPose vere foreshortening, unusual poses, occluded limbs as the occluded arms in row 3, columns 2 and 6, unusual illumination conditions.
  • In most of the cases, when the estimated pose is not precise, it still has a correct shape.
  • In the last row some of the predicted limbs are not aligned with the true locations, the overall shape of the pose is correct.
  • Results on FLIC are usually better with occasional visible mistakes on lower arms
Results
  • Comparisons The authors present comparative results to other approaches.
  • The authors show results for the four most challenging limbs – lower and upper arms and legs – as well as the average value across these limbs for all compared algorithms.
  • It is worth noting that while the other approaches exhibit strengths for particular limbs, none of the other dataset consistently dominates across all limbs.
  • DeepPose shows strong results for all challenging limbs
Conclusion
  • To the knowledge, the first application of Deep Neural Networks (DNNs) to human pose estimation.
  • The authors' formulation of the problem as DNN-based regression to joint coordinates and the presented cascade of such regressors has the advantage of capturing context and reasoning about pose in a holistic manner.
  • The authors plan to investigate novel architectures which could be potentially better tailored towards localization problems in general, and in pose estimation in particular
Summary
  • Introduction:

    The problem of human pose estimation, defined as the problem of localization of human joints, has enjoyed substantial attention in the computer vision community.
  • The above efficiency, is achieved at the cost of limited expressiveness – the use of local detectors, which reason in many cases about a single part, and most importantly by modeling only a small subset of all interactions between body parts
  • These limitations, as exemplified in Fig. 1, have been recognized and methods reasoning about pose in a holistic manner have been proposed [15, 20] but with limited success in real-world problems
  • Methods:

    Johnson et al − wrists.
  • Johnson et al − elbows Legs.
  • DeepPose vere foreshortening, unusual poses, occluded limbs as the occluded arms in row 3, columns 2 and 6, unusual illumination conditions.
  • In most of the cases, when the estimated pose is not precise, it still has a correct shape.
  • In the last row some of the predicted limbs are not aligned with the true locations, the overall shape of the pose is correct.
  • Results on FLIC are usually better with occasional visible mistakes on lower arms
  • Results:

    Comparisons The authors present comparative results to other approaches.
  • The authors show results for the four most challenging limbs – lower and upper arms and legs – as well as the average value across these limbs for all compared algorithms.
  • It is worth noting that while the other approaches exhibit strengths for particular limbs, none of the other dataset consistently dominates across all limbs.
  • DeepPose shows strong results for all challenging limbs
  • Conclusion:

    To the knowledge, the first application of Deep Neural Networks (DNNs) to human pose estimation.
  • The authors' formulation of the problem as DNN-based regression to joint coordinates and the presented cascade of such regressors has the advantage of capturing context and reasoning about pose in a holistic manner.
  • The authors plan to investigate novel architectures which could be potentially better tailored towards localization problems in general, and in pose estimation in particular
Tables
  • Table1: Percentage of Correct Parts (PCP) at 0.5 on LSP for DeepPose as well as five state-of-art approaches
  • Table2: Percentage of Correct Parts (PCP) at 0.5 on Image Parse dataset for DeepPose as well as two state-of-art approaches on Image Parse dataset. Results obtained from [<a class="ref-link" id="c17" href="#r17">17</a>]
Download tables as Excel
Related work
  • The idea of representing articulated objects in general, and human pose in particular, as a graph of parts has been advocated from the early days of computer vision [16]. The so called Pictorial Strictures (PSs), introduced by Fishler and Elschlager [8], were made tractable and practical by Felzenszwalb and Huttenlocher [6] using the distance transform trick. As a result, a wide variety of PS-based models with practical significance were subsequently developed.

    The above tractability, however, comes with the limitation of having a tree-based pose models with simple binary potential not depending on image data. As a result, research has focused on enriching the representational power of the models while maintaining tractability. Earlier attempts to achieve this were based on richer part detectors [18, 1, 4]. More recently, a wide variety of models expressing complex joint relationships were proposed. Yang and Ramanan [26] use a mixture model of parts. Mixture models on the full model scale, by having mixture of PSs, have been studied by Johnson and Everingham [13]. Richer higher-order spatial relationships were captured in a hierarchical model by Tian et al [24]. A different approach to capture higherorder relationship is through image-dependent PS models, which can be estimated via a global classifier [25, 19, 17].
Reference
  • M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • M. Dantone, J. Gall, C. Leistner, and L. Van Gool. Human pose estimation using body parts dependent joint regressors. In CVPR, 2013.
    Google ScholarFindings
  • J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. In COLT. ACL, 2010.
    Google ScholarLocate open access versionFindings
  • M. Eichner and V. Ferrari. Better appearance models for pictorial structures. 2009.
    Google ScholarFindings
  • M. Eichner, M. Marin-Jimenez, A. Zisserman, and V. Ferrari. Articulated human pose estimation and search in (almost) unconstrained still images. ETH Zurich, D-ITET, BIWI, Technical Report No, 272, 2010.
    Google ScholarLocate open access versionFindings
  • P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial structures for object recognition. International Journal of Computer Vision, 61(1):55–79, 2005.
    Google ScholarLocate open access versionFindings
  • V. Ferrari, M. Marin-Jimenez, and A. Zisserman. Progressive search space reduction for human pose estimation. In CVPR, 2008.
    Google ScholarLocate open access versionFindings
  • M. A. Fischler and R. A. Elschlager. The representation and matching of pictorial structures. Computers, IEEE Transactions on, 100(1):67–92, 1973.
    Google ScholarLocate open access versionFindings
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • G. Gkioxari, P. Arbelaez, L. Bourdev, and J. Malik. Articulated pose estimation using discriminative armlet classifiers. In CVPR, 2013.
    Google ScholarLocate open access versionFindings
  • C. Ionescu, F. Li, and C. Sminchisescu. Latent structured models for human pose estimation. In ICCV, 2011.
    Google ScholarLocate open access versionFindings
  • S. Johnson and M. Everingham. Clustered pose and nonlinear appearance models for human pose estimation. In BMVC, 2010.
    Google ScholarLocate open access versionFindings
  • S. Johnson and M. Everingham. Learning effective human pose estimation from inaccurate annotation. In CVPR, 2011.
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • G. Mori and J. Malik. Estimating human body configurations using shape context matching. In ECCV, 2002.
    Google ScholarLocate open access versionFindings
  • R. Nevatia and T. O. Binford. Description and recognition of curved objects. Artificial Intelligence, 8(1):77–98, 1977.
    Google ScholarLocate open access versionFindings
  • L. Pishchulin, M. Andriluka, P. Gehler, and B. Schiele. Poselet conditioned pictorial structures. In CVPR, 2013.
    Google ScholarLocate open access versionFindings
  • [19] B. Sapp and B. Taskar. Modec: Multimodal decomposable models for human pose estimation. In CVPR, 2013.
    Google ScholarLocate open access versionFindings
  • [20] G. Shakhnarovich, P. Viola, and T. Darrell. Fast pose estimation with parameter-sensitive hashing. In CVPR, 2003.
    Google ScholarFindings
  • [21] Y. Sun, X. Wang, and X. Tang. Deep convolutional network cascade for facial point detection. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3476–3483. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • [22] C. Szegedy, A. Toshev, and D. Erhan. Object detection via deep neural networks. In NIPS 26, 2013.
    Google ScholarLocate open access versionFindings
  • [23] G. W. Taylor, R. Fergus, G. Williams, I. Spiro, and C. Bregler. Pose-sensitive embedding by nonlinear nca regression. In NIPS, 2010.
    Google ScholarLocate open access versionFindings
  • [24] Y. Tian, C. L. Zitnick, and S. G. Narasimhan. Exploring the spatial hierarchy of mixture models for human pose estimation. In ECCV, 2012.
    Google ScholarLocate open access versionFindings
  • [25] F. Wang and Y. Li. Beyond physical connections: Tree models in human pose estimation. In CVPR, 2013.
    Google ScholarLocate open access versionFindings
  • [26] Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, 2011.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments