Learning Modality Feature Fusion Via Transformer for RGBT-tracking

Yujue Cai,Xiubao Sui,Guohua Gu,Qian Chen

INFRARED PHYSICS & TECHNOLOGY（2023）

引用 1|浏览10

暂无评分

摘要

RGB-T tracking can be seen as multi-view fusion tracking, and in this study, we propose a network with transformer structure, Multi-Modal Mutual Propagation Tracker (MMMPT). In order to obtain robust appearance model from multi-modal data, we adopt encoder–decoder architecture for extract information. In the encoding stage, the template features of multiple frames enhance the common features across them through the self-attention mechanism to obtain time-invariant target representation. At the same time, it also interacts with multi-modal data through cross-modal propagation, resulting in a modal-invariant representation of the target. The transformer decoder transfers useful information from the template to search areas through a similarity matrix. We experiment on the RGBT234, GTOT, VTUAV and LasHeR datasets to assess the RGBT-transformer tracker. Extensive experiments indicate that our proposed framework is not inferior to the state-of-the-art trackers in terms of robustness and accuracy.

查看译文

关键词

RGB-T tracking,Deep learning,Transformer,Challenge-aware,Feature fusion

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要