'Skimming-Perusal' Tracking: A Framework for Real-Time and Robust Long-Term Tracking

Bin Yan
Bin Yan
Haojie Zhao
Haojie Zhao
Xiaoyun Yang
Xiaoyun Yang

2985509612, pp. 2385-2393, 2019.

Cited by: 45|Views37
EI
Weibo:
In, there exist three major criteria to evaluate the performance of different trackers, namely, true positive rate, true negative rate and maximum geometric mean

Abstract:

Compared with traditional short-term tracking, long-term tracking poses more challenges and is much closer to realistic applications. However, few works have been done and their performance have also been limited. In this work, we present a novel robust and real-time long-term tracking framework based on the proposed skimming and perusa...More

Code:

Data:

0
Introduction
  • Online visual tracking is one of most important problems in computer vision, and has many practical applications including video surveillance, behavior analysis, visual navigation, augmented reality and so on.
  • Compared with short-term tracking, the long-term tracking task requires the tracker having the capability to capture the tracked object in long-term videos and to handle the frequent target disappearance and reappearance.
  • It poses more challenges than short-term tracking mainly from two aspects.
  • It is critical for long-term trackers to capture the tracked object in long-term sequences, determine whether the target is present or absent, and have the capability of image-wide re-detection
Highlights
  • Online visual tracking is one of most important problems in computer vision, and has many practical applications including video surveillance, behavior analysis, visual navigation, augmented reality and so on
  • In [33], there exist three major criteria to evaluate the performance of different trackers, namely, true positive rate (TPR), true negative rate (TNR) and maximum geometric mean (MaxGM)
  • TPR gives the fraction of present objects that are reported present and correctly located, while TNR measures the fraction of absent objects that are reported absent
  • The MaxGM rule (4) is defined to synthetically consider both TPR and TNR, which is adopted for ranking different trackers
  • Numerous experimental results on two recent benchmarks show that our tracker achieves the best performance and runs at a real-time speed
  • It is worth noticing that our ‘Skimming-Perusal’ model is a simple yet effective real-time long-term tracking framework
Methods
  • The authors implement the tracker using Python with the Tensorflow [1] and Keras deep learning libraries.
  • The proposed method is tested on a PC machine with an Inter i7 CPU (32G RAM) and a NVIDIA GTX1080Ti GPU (11G memory), which runs in real-time with 25.7 frames per second.
  • The authors' tracker is denoted as SPLT.
  • Both training and testing codes are available at https://github.com/iiau-tracker/SPLT.
  • The quantitative evaluations and ablation studies are reported as follows
Results
  • The VOT2018LT [23] dataset is first presented in Visual Object Tracking (VOT) challenge 2018 to evaluate the performance of different long-term trackers
  • This dataset includes 35 sequences of various objects, with the total frame length of 146847 frames and the resolution ranges between 1280 × 720 and 290 × 217.
  • In [33], the authors divide the OxUvA long-term dataset into two disjoint subsets, i.e., dev and test sets
  • Based on these two subsets, the OxUvA benchmark poses two challenges: constrained and open.
  • A larger MaxGM value means a better performance
Conclusion
  • This work presents a novel ‘Skimming-Perusal’ tracking framework for long-term visual tracking.
  • The perusal module aims to precisely locate the tracked object in a local search region using the offline-trained regression and verification networks.
  • It is worth noticing that our ‘Skimming-Perusal’ model is a simple yet effective real-time long-term tracking framework.
  • The authors believe that it can be acted as a new baseline for further researches
Summary
  • Introduction:

    Online visual tracking is one of most important problems in computer vision, and has many practical applications including video surveillance, behavior analysis, visual navigation, augmented reality and so on.
  • Compared with short-term tracking, the long-term tracking task requires the tracker having the capability to capture the tracked object in long-term videos and to handle the frequent target disappearance and reappearance.
  • It poses more challenges than short-term tracking mainly from two aspects.
  • It is critical for long-term trackers to capture the tracked object in long-term sequences, determine whether the target is present or absent, and have the capability of image-wide re-detection
  • Objectives:

    The authors' goal is to develop a simple but effective longterm tracking framework with high accuracy and real-time performance.
  • Methods:

    The authors implement the tracker using Python with the Tensorflow [1] and Keras deep learning libraries.
  • The proposed method is tested on a PC machine with an Inter i7 CPU (32G RAM) and a NVIDIA GTX1080Ti GPU (11G memory), which runs in real-time with 25.7 frames per second.
  • The authors' tracker is denoted as SPLT.
  • Both training and testing codes are available at https://github.com/iiau-tracker/SPLT.
  • The quantitative evaluations and ablation studies are reported as follows
  • Results:

    The VOT2018LT [23] dataset is first presented in Visual Object Tracking (VOT) challenge 2018 to evaluate the performance of different long-term trackers
  • This dataset includes 35 sequences of various objects, with the total frame length of 146847 frames and the resolution ranges between 1280 × 720 and 290 × 217.
  • In [33], the authors divide the OxUvA long-term dataset into two disjoint subsets, i.e., dev and test sets
  • Based on these two subsets, the OxUvA benchmark poses two challenges: constrained and open.
  • A larger MaxGM value means a better performance
  • Conclusion:

    This work presents a novel ‘Skimming-Perusal’ tracking framework for long-term visual tracking.
  • The perusal module aims to precisely locate the tracked object in a local search region using the offline-trained regression and verification networks.
  • It is worth noticing that our ‘Skimming-Perusal’ model is a simple yet effective real-time long-term tracking framework.
  • The authors believe that it can be acted as a new baseline for further researches
Tables
  • Table1: Comparison of our tracker and 15 competing algorithms on the VOT2018LT dataset [<a class="ref-link" id="c23" href="#r23">23</a>]. The best three results are marked in red, blue and green bold fonts respectively. The trackers are ranked from top to bottom using the F-score measure
  • Table2: Quantitative analysis with respect to different attributes. Visual attributes
  • Table3: Effectiveness of different components for our tracker
  • Table4: Comparison of different verification networks
  • Table5: Comparisons of different tracking algorithms on the OxUvA [<a class="ref-link" id="c33" href="#r33">33</a>] long-term dataset. The best three results are marked in red, blue and green bold fonts respectively. The trackers are ranked from top to bottom using the MaxGM measure
Download tables as Excel
Related work
  • Traditional Long-term Tracking. In [15], Kalal et al propose a tracking-learning-detection (TLD) algorithm for long-term tracking, which exploits an optical-flow-based matcher for local search and an ensemble of weak classifiers for global re-detection. Following the idea of TLD, Ma et al [24] develop a long-term correlation tracker (LCT) using a KCF method as a local tracker and a random ferns classifier as a detector. The fully correlational long-term tracker (FCLT) [22] maintains several correlation filters trained on different time scales as a detector and exploits the correlation response to guide the dynamic interaction between the short-term tracker and long-term detector.

    Besides, some researchers have addressed the longterm tracking task using the keypoint matching or global proposal scheme. The CMT [26] method utilizes a keypoint-based model to conduct long-term tracking, and the MUSTer [11] tracker exploits an integrated correlation filter for short-term localization and a keypoint-based matcher for long-term tracking. But the keypoint extractors and descriptors are often not stable in complicated scenes. In [40], Zhu et al develop an EdgeBox Tracking (EBT) method to generate a series of candidate proposals using EdgeBox [42] and verify these proposals using structured SVM [10] with multi-scale color histograms. However, the edge-based object proposal is inevitably susceptible to illumination variation and motion blur. The above-mentioned trackers have attempted to address long-term tracking from different perspectives, but their performance are not satisfactory since they merely exploit hand-crafted low-level features. In this work, we develop a simple yet effective long-term tracking framework based on deep learning, whose goal is to achieve high accuracy with real-time performance. Deep Long-term Tracking. Recently, some researchers have attempted to exploit deep-learning-based models for long-term tracking. Fan et al [7] propose a parallel tracking and verifying (PTAV) framework, which effectively integrates a real-time tracker and a high accurate verifier for robust tracking. The PTAV method performs much better than other compared trackers on the UAV20L dataset. Valmadre et al [33] implement a long-term tracker, named as SiamFC+R. This method equips SiamFC [3] with a simple re-detection scheme, and finds the tracked object within a random search region when the maximum score of the SiamFC’s response is lower than a given threshold. The experimental results demonstrate that the SiamFC+R tracker achieves significantly better performance than the original SiamFC method on the OxUvA dataset. But the SiamFC’s score map is not always reliable, which limits the performance of the SiamFC+R tracker.
Funding
  • This paper is supported in part by National Natural Science Foundation of China (Nos. 61872056, 61725202, 61829102, 61751212), in part by the Fundamental Research Funds for the Central Universities (Nos
Study subjects and analysis
triplet pairs: 60000
We train regression and skimming networks for 500,000 iterations and 20 epochs with the same learning rate of 1e−3, respectively. For the verification network, we train it for 70 epochs and exploit 60,000 triplet pairs in each epoch. The learning rate is initially set to 1e−2 and gradually decayed to its 1/10 every 20 epochs

performance in most cases: 3
We conduct this analysis on VOT2018LT dataset and report the results in Table 2. Our tracker achieves top three performance in most cases. MBMD also performs well, but it runs much slower than ours

Reference
  • Martn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, and Matthieu Devin. TensorFlow: Large scale machine learning on heterogeneous distributed systems. In CoRR abs/1603.04467, 2016.
    Findings
  • Luca Bertinetto, Jack Valmadre, Stuart Golodetz, Ondrej Miksik, and Philip H. S. Torr. Staple: Complementary learners for real-time tracking. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Luca Bertinetto, Jack Valmadre, Joo F. Henriques, Andrea Vedaldi, and Philip H. S. Torr. Fully-convolutional siamese networks for object tracking. In ECCV Workshop, 2016.
    Google ScholarLocate open access versionFindings
  • Kenan Dai, Dong Wang, Huchuan Lu, Chong Sun, and Jianhua Li. Visual tracking via adaptive spatially-regularized correlation filters. In ICCV, 2019.
    Google ScholarFindings
  • Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. ECO: Efficient convolution operators for tracking. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. ATOM: Accurate tracking by overlap maximization. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Heng Fan and Haibin Ling. Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Heng Fan and Haibin Ling. Siamese cascaded region proposal networks for real-time visual tracking. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Hamed Kiani Galoogahi, Ashton Fagg, and Simon Lucey. Learning background-aware correlation filters for visual tracking. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Sam Hare, Stuart Golodetz, Amir Saffari, Vibhav Vineet, Ming-Ming Cheng, Stephen L. Hicks, and Philip H. S. Torr. Struck: Structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10):2096–2109, 2016.
    Google ScholarLocate open access versionFindings
  • Zhibin Hong, Zhe Chen, Chaohui Wang, Xue Mei, Danil Prokhorov, and Dacheng Tao. MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. MobileNets: Efficient convolutional neural networks for mobile vision applications. In CoRR abs/1704.04861, 2017.
    Findings
  • James Steven Supancic III and Deva Ramanan. Tracking as online decision-making: Learning a policy from streaming videos with reinforcement learning. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Ilchae Jung, Jeany Son, Mooyeol Baek, and Bohyung Han. Real-time MDNet. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. Trackinglearning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1409–1422, 2012.
    Google ScholarLocate open access versionFindings
  • Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pfugfelder, Luka Cehovin Zajc, Tomas Vojir, Goutam Bhat, Alan Lukezic, Abdelrahman Eldesokey, Gustavo Fernandez, and et al. The sixth visual object tracking VOT2018 challenge results. 2018.
    Google ScholarFindings
  • Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. SiamRPN++: Evolution of siamese visual tracking with very deep networks. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. High performance visual tracking with siamese region proposal network. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Peixia Li, Boyu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, and Huchuan Lu. GradNet: Gradient-guided network for visual object tracking. In ICCV, 2019.
    Google ScholarLocate open access versionFindings
  • Peixia Li, Dong Wang, Lijun Wang, and Huchuan Lu. Deep visual tracking: Review and experimental comparison. Pattern Recognition, 76:323–338, 2018.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • Alan Lukei, Luka ehovin Zajc, Tom Voj, Ji Matas, and Matej Kristan. FCLT - a fully-correlational long-term tracker. In ACCV, 2018.
    Google ScholarLocate open access versionFindings
  • Alan Lukei, Luka ehovin Zajc, Tom Voj, Ji Matas, and Matej Kristan. Now you see me: Evaluating performance in long-term visual tracking. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Chao Ma, Xiaokang Yang, Chongyang Zhang, and Ming Hsuan Yang. Long-term correlation tracking. In CVPR, 2015.
    Google ScholarFindings
  • Hyeonseob Nam and Bohyung Han. Learning multi–domain convolutional neural networks for visual tracking. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Georg Nebehay and Roman Pflugfelder. Clustering of static-adaptive correspondences for deformable object tracking. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Esteban Real, Jonathon Shlens, Stefano Mazzocchi, Xin Pan, and Vincent Vanhoucke. Youtube-boundingboxes: A large highprecision human-annotated data set for object detection in video. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. ImageNet Large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
    Google ScholarLocate open access versionFindings
  • Florian Schroff, Dmitry Kalenichenko, and James Philbin. FaceNet: A unified embedding for face recognition and clustering. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Chong Sun, Dong Wang, Huchuan Lu, and Ming-Hsuan Yang. Correlation tracking via joint discrimination and reliability learning. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Chong Sun, Dong Wang, Huchuan Lu, and Ming-Hsuan Yang. Learning spatial-aware regressions for visual tracking. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Ran Tao, Efstratios Gavves, and Arnold W. M. Smeulders. Siamese instance search for tracking. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Jack Valmadre, Luca Bertinetto, Joao F. Henriques, Ran Tao, Andrea Vedaldi, Arnold W.M. Smeulders, Philip H.S. Torr, and Efstratios Gavves. Long-term tracking in the wild: a benchmark. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip H. S. Torr. Fast online object tracking and segmentation: A unifying approach. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Yi Wu, Jongwoo Lim, and Ming Hsuan Yang. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, 2015.
    Google ScholarLocate open access versionFindings
  • Tianzhu Zhang, Si Liu, Changsheng Xu, Bin Liu, and Ming-Hsuan Yang. Correlation particle filter for visual tracking. IEEE Transactions on Image Processing, 27(6):2676–2687, 2018.
    Google ScholarLocate open access versionFindings
  • Tianzhu Zhang, Changsheng Xu, and Ming-Hsuan Yang. Learning multi-task correlation particle filters for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2):365– 378, 2019.
    Google ScholarLocate open access versionFindings
  • Yunhua Zhang, Dong Wang, Lijun Wang, Jinqing Qi, and Huchuan Lu. Learning regression and verification networks for long-term visual tracking. CoRR, abs/1809.04320, 2018.
    Findings
  • Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. Random erasing data augmentation. In CoRR abs/1708.04896, 2017.
    Findings
  • Gao Zhu, Fatih Porikli, and Hongdong Li. Beyond local search: Tracking objects everywhere with instance-specific proposals. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, and Weiming Hu. Distractor-aware siamese networks for visual object tracking. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • C. Lawrence Zitnick and Piotr Dollar. Edge Boxes: Locating object proposals from edges. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments