BAN-ima: A Box Adaptive Network With Iterative Mixed Attention for Visual Tracking

Qun Li,Haijun Zhang, Kai Yang,Gaochang Wu

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS(2024)

Cited 0|Views15
No score
Abstract
Recent anchor-free trackers that leverage the remarkably expressive capacity of the fully convolutional network have drawn considerable attention within the field of tracking. However, the independence of feature extraction and feature fusion in existing anchor-free trackers limit the representation ability of the network. To address this issue, we present a new anchor-free tracker, named box adaptive network with iterative mixed attention (BAN-ima), that adopts Vision Transformer (ViT) as its backbone. In particular, we introduce a multi-level feature fusion framework which can effectively incorporate multiple network branches to produce precise predictions of target bounding boxes. To further enhance the tracking precision, we propose a probabilistic branch to replace the traditional classification branch in the tracking head. Furthermore, we introduce an improved Intersection over Union (IoU) loss function, denoted by alpha -CIoU, which adaptively reweights the loss and gradients of objects with high and low IoUs. This scheme enhances the object localization and regression precision for object tracking. Extensive experiments were carried out on well-established benchmarks for visual tracking, including TrackingNet, GOT-10k, LaSOT, UAV123 and NFS30. The results demonstrate that our developed BAN-ima tracker achieves comparable performance to state-of-the-art trackers while maintaining a real-time speed of 35 frames per second (FPS). In Particular, it achieves 84.7% AUC and 89.1% normalized precision score on the TrackingNet dataset.
More
Translated text
Key words
Target tracking,Feature extraction,Transformers,Probabilistic logic,Head,Task analysis,Location awareness,Visual tracking,Siamese network,anchor-free tracker,vision transformer
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined