CATrack: Convolution and Attention Feature Fusion for Visual Object Tracking

Longkun Zhang,Jiajun Wen, Zichen Dai, Rouyi Zhou,Zhihui Lai

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX(2024)

引用 0|浏览4
暂无评分
摘要
In visual object tracking, information embedding and feature fusion between the target template and the search have been hot research spots in the past decades. Linear convolution is a common way to perform correlation operations. The convolution operation is good at processing local information, while ignoring global information. By contrast, the attention mechanism has the advantages of innate global information modeling. To model the local information of the target template and the global information of the search area, we propose a convolution and attention feature fusion module (CAM). Thus, the efficient information embedding and feature fusion can be achieved in parallel. Moreover, a bi-directional information flow bridge is constructed to realize information embedding and feature fusion between the target template and the search area. Specifically, it includes a convolution-to-attention bridge module (CABM) and an attention-to-convolutional bridge module(ACBM). Finally, we present a novel tracker based on convolution and attention (CATrack), which combines the advantages of convolution and attention operators, and has enhanced ability for accurate target positioning. Comprehensive experiments have been conducted on four tracking benchmarks: LaSOT, TrackingNet, GOT-10k and UAV123. Experiments show that the performance of our CATrack is more competitive than the state-of-the-art trackers.
更多
查看译文
关键词
Visual object tracking,Attention learning,Feature fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要