PARFormer: Transformer-Based Multi-Task Network for Pedestrian Attribute Recognition

Xinwen Fan,Yukang Zhang,Yang Lu,Hanzi Wang

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY（2024）

引用 0|浏览19

暂无评分

摘要

Pedestrian attribute recognition (PAR) has received increasing attention because of its wide application in video surveillance and pedestrian analysis. Extracting robust feature representation is one of the key challenges in this task. The existing methods primarily rely on convolutional neural networks (CNNs) as the backbone network for feature extraction. However, these methods mainly focus on small discriminative regions while ignoring the global perspective. To overcome these limitations, we propose PARFormer, a pure transformer-based multi-task PAR network consisting of four modules. In the feature extraction module, we build a transformer-based strong baseline for feature extraction, which achieves competitive results on several PAR benchmarks compared with the existing CNN-based baseline methods. Since the PAR task is vulnerable to environmental factors, we enhance feature robustness in the feature processing module and propose an effective data augmentation strategy named batch random mask (BRM) block to reinforce the attentive feature learning of random patches. Furthermore, we propose a multi-attribute center loss (MACL) to augment the inter-attribute discriminability of feature representations. As viewpoints can affect some specific attributes, in the viewpoint perception module, we propose a multi-view contrastive loss (MVCL) that enables the network to exploit the viewpoint information. In the attribute recognition module, we alleviate the negative-positive imbalance problem to generate the attribute predictions. These modules interact and jointly learn a highly discriminative feature space and supervise the generation of the final features. Extensive experimental results show that the proposed PARFormer network performs well compared to the state-of-the-art methods on several public datasets, including PETA, RAP, and PA100K. Code will be released at https://github.com/xwf199/PARFormer.

查看译文

关键词

Transformers,Feature extraction,Task analysis,Visualization,Multitasking,Image recognition,Training,pedestrian attribute recognition,transformer,Feature processing,viewpoint information

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要