SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
CVPR 2024(2024)
摘要
Multimodal Visual Object Tracking (VOT) has recently gained significant
attention due to its robustness. Early research focused on fully fine-tuning
RGB-based trackers, which was inefficient and lacked generalized representation
due to the scarcity of multimodal data. Therefore, recent studies have utilized
prompt tuning to transfer pre-trained RGB-based trackers to multimodal data.
However, the modality gap limits pre-trained knowledge recall, and the
dominance of the RGB modality persists, preventing the full utilization of
information from other modalities. To address these issues, we propose a novel
symmetric multimodal tracking framework called SDSTrack. We introduce
lightweight adaptation for efficient fine-tuning, which directly transfers the
feature extraction ability from RGB to other domains with a small number of
trainable parameters and integrates multimodal features in a balanced,
symmetric manner. Furthermore, we design a complementary masked patch
distillation strategy to enhance the robustness of trackers in complex
environments, such as extreme weather, poor imaging, and sensor failure.
Extensive experiments demonstrate that SDSTrack outperforms state-of-the-art
methods in various multimodal tracking scenarios, including RGB+Depth,
RGB+Thermal, and RGB+Event tracking, and exhibits impressive results in extreme
conditions. Our source code is available at https://github.com/hoqolo/SDSTrack.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要