TKFormer: Typed Keypoints Guided Transformer for Human Parsing

Jian Zhang, Hong Liu,Yidi Li,Wenhao Li, Runwei Ding

2023 7th Asian Conference on Artificial Intelligence Technology (ACAIT)（2023）

引用 0|浏览16

暂无评分

摘要

Human parsing is a fine-grained image segmentation task for human behavior understanding. We introduce powerful image segmentation Transformers to this task and achieve promising performance. However, current Transformers ignore the explicit modeling of human body structures and prior knowledge of locations of body parts, leading to rough unreasonable segmentation of human bodies. In this work, we attribute the above issues to the lack of human keypoint localization. To this end, we introduce the human body keypoints from pose estimation and propose the Typed Keypoint Image (TK Image) to represent and encode various semantic features for different types of human keypoints. Moreover, a TK guided Transformer (TKFormer) is proposed with a guidance structure based on feature fusion to guide human parsing. To obtain more accurate human keypoints, we further propose a pure TRansformer for bottom-up Pose (TRPose) estimation, which contains independent left and right confidence decoders and an identity decoder for associative embedding. TRPose achieves better performance than OpenPose on CrowdPose and the best results on Pascal-Person-Part without fine-tuning. Extensive experiments show that TKFormer achieves new state-of-the-art results on Pascal-PersonPart (77.69% mIoU) and LIP (61.26% mIoU).

查看译文

关键词

human parsing,transformer,pose estimation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要