A Multiscale Coarse-to-Fine Human Pose Estimation Network With Hard Keypoint Mining
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS(2024)
Abstract
Current convolution neural network (CNN)-based multiperson pose estimators have achieved great progress, however, they pay no or less attention to "hard" samples, such as occluded keypoints, small and nearly invisible keypoints, and ambiguous keypoints. In this article, we explicitly deal with these "hard" samples by proposing a novel multiscale coarse-to-fine human pose estimation network ((HMPN)-P-2), which includes two sequential subnetworks: CoarseNet and FineNet. CoarseNet conducts a coarse prediction to locate "simple" keypoints like hands and ankles with a multiscale fusion module, which is integrated with bottleneck, resulting in a novel module called multiscale bottleneck. The new module improves the multiscale representation ability of the network in a fine-grained level, while marginally reducing the computation cost because of group convolution. FineNet further infers "hard" keypoints and refines "simple" keypoints simultaneously with a hard keypoint mining loss. Distinct from the previous works, the proposed loss deals with "hard" keypoints differentially and prevents "simple" keypoints from dominating the computed gradients during training. Experiments on the COCO keypoint benchmark show that our approach achieves superior pose estimation performance compared with other state-of-the-art methods. Source code is available for further research: https://github.com/sues-vision/C2F-HumanPoseEstimation.
MoreTranslated text
Key words
Pose estimation,Standards,Convolution,Training,Task analysis,Heating systems,Detectors,Hard sample mining,human pose estimation,multiscale
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined