DeepPose: Human Pose Estimation via Deep Neural Networks
Computer Vision and Pattern Recognition, Volume abs/1312.4659, 2014, Pages 1653-1660.
EI WOS
Weibo:
Abstract:
We propose a method for human pose estimation based on Deep Neural Networks (DNNs). The pose estimation is formulated as a DNN-based regression problem towards body joints. We present a cascade of such DNN regres- sors which results in high precision pose estimates. The approach has the advantage of reasoning about pose in a holistic fash...More
Code:
Data:
Introduction
- The problem of human pose estimation, defined as the problem of localization of human joints, has enjoyed substantial attention in the computer vision community.
- The above efficiency, is achieved at the cost of limited expressiveness – the use of local detectors, which reason in many cases about a single part, and most importantly by modeling only a small subset of all interactions between body parts
- These limitations, as exemplified in Fig. 1, have been recognized and methods reasoning about pose in a holistic manner have been proposed [15, 20] but with limited success in real-world problems
Highlights
- The problem of human pose estimation, defined as the problem of localization of human joints, has enjoyed substantial attention in the computer vision community
- We show that a generic convolutional Deep Neural Networks can be learned for this problem
- To our knowledge, the first application of Deep Neural Networks (DNNs) to human pose estimation
- Our formulation of the problem as Deep Neural Networks-based regression to joint coordinates and the presented cascade of such regressors has the advantage of capturing context and reasoning about pose in a holistic manner
- We show that using a generic convolutional neural network, which was originally designed for classification tasks, can be applied to the different task of localization
- We plan to investigate novel architectures which could be potentially better tailored towards localization problems in general, and in pose estimation in particular
Methods
- Johnson et al − wrists.
- Johnson et al − elbows Legs.
- DeepPose vere foreshortening, unusual poses, occluded limbs as the occluded arms in row 3, columns 2 and 6, unusual illumination conditions.
- In most of the cases, when the estimated pose is not precise, it still has a correct shape.
- In the last row some of the predicted limbs are not aligned with the true locations, the overall shape of the pose is correct.
- Results on FLIC are usually better with occasional visible mistakes on lower arms
Results
- Comparisons The authors present comparative results to other approaches.
- The authors show results for the four most challenging limbs – lower and upper arms and legs – as well as the average value across these limbs for all compared algorithms.
- It is worth noting that while the other approaches exhibit strengths for particular limbs, none of the other dataset consistently dominates across all limbs.
- DeepPose shows strong results for all challenging limbs
Conclusion
- To the knowledge, the first application of Deep Neural Networks (DNNs) to human pose estimation.
- The authors' formulation of the problem as DNN-based regression to joint coordinates and the presented cascade of such regressors has the advantage of capturing context and reasoning about pose in a holistic manner.
- The authors plan to investigate novel architectures which could be potentially better tailored towards localization problems in general, and in pose estimation in particular
Summary
Introduction:
The problem of human pose estimation, defined as the problem of localization of human joints, has enjoyed substantial attention in the computer vision community.- The above efficiency, is achieved at the cost of limited expressiveness – the use of local detectors, which reason in many cases about a single part, and most importantly by modeling only a small subset of all interactions between body parts
- These limitations, as exemplified in Fig. 1, have been recognized and methods reasoning about pose in a holistic manner have been proposed [15, 20] but with limited success in real-world problems
Methods:
Johnson et al − wrists.- Johnson et al − elbows Legs.
- DeepPose vere foreshortening, unusual poses, occluded limbs as the occluded arms in row 3, columns 2 and 6, unusual illumination conditions.
- In most of the cases, when the estimated pose is not precise, it still has a correct shape.
- In the last row some of the predicted limbs are not aligned with the true locations, the overall shape of the pose is correct.
- Results on FLIC are usually better with occasional visible mistakes on lower arms
Results:
Comparisons The authors present comparative results to other approaches.- The authors show results for the four most challenging limbs – lower and upper arms and legs – as well as the average value across these limbs for all compared algorithms.
- It is worth noting that while the other approaches exhibit strengths for particular limbs, none of the other dataset consistently dominates across all limbs.
- DeepPose shows strong results for all challenging limbs
Conclusion:
To the knowledge, the first application of Deep Neural Networks (DNNs) to human pose estimation.- The authors' formulation of the problem as DNN-based regression to joint coordinates and the presented cascade of such regressors has the advantage of capturing context and reasoning about pose in a holistic manner.
- The authors plan to investigate novel architectures which could be potentially better tailored towards localization problems in general, and in pose estimation in particular
Tables
- Table1: Percentage of Correct Parts (PCP) at 0.5 on LSP for DeepPose as well as five state-of-art approaches
- Table2: Percentage of Correct Parts (PCP) at 0.5 on Image Parse dataset for DeepPose as well as two state-of-art approaches on Image Parse dataset. Results obtained from [<a class="ref-link" id="c17" href="#r17">17</a>]
Related work
- The idea of representing articulated objects in general, and human pose in particular, as a graph of parts has been advocated from the early days of computer vision [16]. The so called Pictorial Strictures (PSs), introduced by Fishler and Elschlager [8], were made tractable and practical by Felzenszwalb and Huttenlocher [6] using the distance transform trick. As a result, a wide variety of PS-based models with practical significance were subsequently developed.
The above tractability, however, comes with the limitation of having a tree-based pose models with simple binary potential not depending on image data. As a result, research has focused on enriching the representational power of the models while maintaining tractability. Earlier attempts to achieve this were based on richer part detectors [18, 1, 4]. More recently, a wide variety of models expressing complex joint relationships were proposed. Yang and Ramanan [26] use a mixture model of parts. Mixture models on the full model scale, by having mixture of PSs, have been studied by Johnson and Everingham [13]. Richer higher-order spatial relationships were captured in a hierarchical model by Tian et al [24]. A different approach to capture higherorder relationship is through image-dependent PS models, which can be estimated via a global classifier [25, 19, 17].
Reference
- M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. In CVPR, 2009.
- M. Dantone, J. Gall, C. Leistner, and L. Van Gool. Human pose estimation using body parts dependent joint regressors. In CVPR, 2013.
- J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. In COLT. ACL, 2010.
- M. Eichner and V. Ferrari. Better appearance models for pictorial structures. 2009.
- M. Eichner, M. Marin-Jimenez, A. Zisserman, and V. Ferrari. Articulated human pose estimation and search in (almost) unconstrained still images. ETH Zurich, D-ITET, BIWI, Technical Report No, 272, 2010.
- P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial structures for object recognition. International Journal of Computer Vision, 61(1):55–79, 2005.
- V. Ferrari, M. Marin-Jimenez, and A. Zisserman. Progressive search space reduction for human pose estimation. In CVPR, 2008.
- M. A. Fischler and R. A. Elschlager. The representation and matching of pictorial structures. Computers, IEEE Transactions on, 100(1):67–92, 1973.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
- G. Gkioxari, P. Arbelaez, L. Bourdev, and J. Malik. Articulated pose estimation using discriminative armlet classifiers. In CVPR, 2013.
- C. Ionescu, F. Li, and C. Sminchisescu. Latent structured models for human pose estimation. In ICCV, 2011.
- S. Johnson and M. Everingham. Clustered pose and nonlinear appearance models for human pose estimation. In BMVC, 2010.
- S. Johnson and M. Everingham. Learning effective human pose estimation from inaccurate annotation. In CVPR, 2011.
- A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- G. Mori and J. Malik. Estimating human body configurations using shape context matching. In ECCV, 2002.
- R. Nevatia and T. O. Binford. Description and recognition of curved objects. Artificial Intelligence, 8(1):77–98, 1977.
- L. Pishchulin, M. Andriluka, P. Gehler, and B. Schiele. Poselet conditioned pictorial structures. In CVPR, 2013.
- [19] B. Sapp and B. Taskar. Modec: Multimodal decomposable models for human pose estimation. In CVPR, 2013.
- [20] G. Shakhnarovich, P. Viola, and T. Darrell. Fast pose estimation with parameter-sensitive hashing. In CVPR, 2003.
- [21] Y. Sun, X. Wang, and X. Tang. Deep convolutional network cascade for facial point detection. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3476–3483. IEEE, 2013.
- [22] C. Szegedy, A. Toshev, and D. Erhan. Object detection via deep neural networks. In NIPS 26, 2013.
- [23] G. W. Taylor, R. Fergus, G. Williams, I. Spiro, and C. Bregler. Pose-sensitive embedding by nonlinear nca regression. In NIPS, 2010.
- [24] Y. Tian, C. L. Zitnick, and S. G. Narasimhan. Exploring the spatial hierarchy of mixture models for human pose estimation. In ECCV, 2012.
- [25] F. Wang and Y. Li. Beyond physical connections: Tree models in human pose estimation. In CVPR, 2013.
- [26] Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, 2011.
Full Text
Tags
Comments