Representation Alignment Contrastive Regularization for Multi-Object Tracking
CoRR(2024)
摘要
Achieving high-performance in multi-object tracking algorithms heavily relies
on modeling spatio-temporal relationships during the data association stage.
Mainstream approaches encompass rule-based and deep learning-based methods for
spatio-temporal relationship modeling. While the former relies on physical
motion laws, offering wider applicability but yielding suboptimal results for
complex object movements, the latter, though achieving high-performance, lacks
interpretability and involves complex module designs. This work aims to
simplify deep learning-based spatio-temporal relationship models and introduce
interpretability into features for data association. Specifically, a
lightweight single-layer transformer encoder is utilized to model
spatio-temporal relationships. To make features more interpretative, two
contrastive regularization losses based on representation alignment are
proposed, derived from spatio-temporal consistency rules. By applying weighted
summation to affinity matrices, the aligned features can seamlessly integrate
into the data association stage of the original tracking workflow. Experimental
results showcase that our model enhances the majority of existing tracking
networks' performance without excessive complexity, with minimal increase in
training overhead and nearly negligible computational and storage costs.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要