Pedestrian Attribute Recognition Based on Multimodal Transformer

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I(2024)

引用 0|浏览2
暂无评分
摘要
Pedestrian attribute recognition (PAR) is susceptible to variable shooting angles, lighting, and occlusions. Improving recognition accuracy to suit its application in various complex scenarios is one of the most important tasks. In this paper, based on the Image-Text Multimodal Transformer, the intra-modal and inter-modal correlations are learned from pedestrian images and attribute labels. The applicability of six different multimodal fusion frameworks for attribute recognition is explored. The impact of different frameworks' fused feature division methods on recognition accuracy is compared and analyzed. The comparative experiments verify the robustness and efficiency of the Early Concatenate framework, which has achieved multiple best metric scores on the two major public PAR datasets, PA100k and RAP. This paper not only proposes a new Transformer-based high-accuracy multimodal network, but also provides feasible ideas and directions for further research on PAR. The comparative discussion based on various multimodal frame-works also provides a perspective that can be learned for other multimodal tasks.
更多
查看译文
关键词
Pedestrian Attribute Recognition,Multimodal Learning,Transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要