AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We show how large-scale existing datasets for object detection can be leveraged for object tracking by a novel interpolation method

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild.

ECCV, (2018)

Cited by: 89|Views37
EI
Full Text
Bibtex
Weibo

Abstract

Despite the numerous developments in object tracking, further improvement of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this wor...More

Code:

Data:

0
Introduction
  • Object tracking is a common task in computer vision, with a long history spanning decades [50,30,44].
  • Current trackers perform well on established datasets such as OTB [48,49] and VOT [25,26,27,24,22,23] benchmarks
  • Most of these datasets are fairly small and do not fully represent the challenges faced when tracking objects in the wild.
Highlights
  • Object tracking is a common task in computer vision, with a long history spanning decades [50,30,44]
  • We present TrackingNet, a large-scale object tracking dataset designed to train deep trackers
  • We evaluate all trackers per attribute to get insights about challenges facing state-of-the-art tracking algorithms
  • We present TrackingNet, which is, to the best of our knowledge, the largest dataset for object tracking
  • We show how large-scale existing datasets for object detection can be leveraged for object tracking by a novel interpolation method
  • We show that pretraining deep models on TrackingNet can improve their performance on other datasets by increasing their metrics by up to 1.7%. (Section 5)
  • We benchmark more than 20 tracking algorithms on this novel dataset and shed light on what attributes are especially difficult for current trackers
Results
  • Each video in TrackingNet Test is annotated with 15 attributes described in Section 3.
  • The authors evaluate all trackers per attribute to get insights about challenges facing state-of-the-art tracking algorithms.
  • The authors show the most interesting results in Figure 8 and refer the reader to the supplementary material for the remaining attributes.
  • The authors find that videos with in-plane rotation, low resolution targets, and full occlusion are consistently the most difficult.
  • Trackers are least affected by illumination variation, partial occlusion, and object deformation.
  • OPE Success plots on TrackingNetTest - In-Plane Rotation (56)
Conclusion
  • The authors present TrackingNet, which is, to the best of the knowledge, the largest dataset for object tracking.
  • The authors show how large-scale existing datasets for object detection can be leveraged for object tracking by a novel interpolation method.
  • The authors plan to sample the extra 500 videos from different classes within the same category.
  • This will allow for further evaluation in regards to generalization.
  • The authors plan to release the training set with the interpolated annotations.
  • The authors will publish the online evaluation server to allow researches to rank their tracking algorithms instantly
Summary
  • Introduction:

    Object tracking is a common task in computer vision, with a long history spanning decades [50,30,44].
  • Current trackers perform well on established datasets such as OTB [48,49] and VOT [25,26,27,24,22,23] benchmarks
  • Most of these datasets are fairly small and do not fully represent the challenges faced when tracking objects in the wild.
  • Objectives:

    The authors aim to extend the test set from 500 to 1000 videos.
  • Results:

    Each video in TrackingNet Test is annotated with 15 attributes described in Section 3.
  • The authors evaluate all trackers per attribute to get insights about challenges facing state-of-the-art tracking algorithms.
  • The authors show the most interesting results in Figure 8 and refer the reader to the supplementary material for the remaining attributes.
  • The authors find that videos with in-plane rotation, low resolution targets, and full occlusion are consistently the most difficult.
  • Trackers are least affected by illumination variation, partial occlusion, and object deformation.
  • OPE Success plots on TrackingNetTest - In-Plane Rotation (56)
  • Conclusion:

    The authors present TrackingNet, which is, to the best of the knowledge, the largest dataset for object tracking.
  • The authors show how large-scale existing datasets for object detection can be leveraged for object tracking by a novel interpolation method.
  • The authors plan to sample the extra 500 videos from different classes within the same category.
  • This will allow for further evaluation in regards to generalization.
  • The authors plan to release the training set with the interpolated annotations.
  • The authors will publish the online evaluation server to allow researches to rank their tracking algorithms instantly
Tables
  • Table1: Comparison of current datasets for object tracking
  • Table2: List and description of the 15 attributes that characterize videos in TrackingNet. Top: automatically estimated. Bottom: visually inspected
  • Table3: Tracking results on the 1sec-long OTB100 dataset using different averaging
  • Table4: Evaluated Trackers. Representation: PI - Pixel Intensity, HOG - Histogram of Oriented Gradients, CN - Color Names, CH - Color Histogram, GK - Gaussian Kernel, K - Keypoints, BP - Binary Pattern, SSVM - Structured Support Vector Machine. Search: PF - Particle Filter, RS - Random Sampling, DS - Dense Sampling
  • Table5: Fine-tuning results for SiameseFC on OTB100 and TrackingNet Test
Download tables as Excel
Related work
  • In the following, we provide an overview of the various research on object tracking. The tasks in the field can be clustered between multi-object tracking [49,25] and single-object tracking [28,35]. The former focuses on multiple instance tracking of class-specific objects, relying on strong and fast object detection algorithms and association estimation between consecutive frames. The latter is the target of this work. It approaches the problem by tracking-by-detection, which consists of two main components: model representation, either generative [20,41] or discriminative [51,14], and object search, a trade-off between computational cost and dense sampling of the region of interest. Correlation Filter Trackers. In recent years, correlation filter (CF) trackers [4,19,16,1] have emerged as the most common, fastest and most accurate category of trackers. CF trackers learn a filter at the first frame, which represents the object of interest. This filter localizes the target in successive frames before being updated. The main reason behind the impressive performance of CF trackers lies in the approximate dense sampling achieved by circulantly shifting the target patch samples [19]. Also, the remarkable runtime performance is achieved by efficiently solving the underlying ridge regression problem in the Fourier domain [4]. Since the inception of CF trackers with single-channel features [4,19], they have been extended with kernels [16], multi-channel features [9] and scale adaptation [32]. In addition, many works enhance the original formulation by adapting the regression target [3], adding context [12,37], spatially regularizing the learned filters and learning continuous filters [10]. Deep Trackers. Beside the CF trackers that use deep features from object detection networks, few works explore more complete deep learning approaches. A first approach consists of learning generic features on a large-scale object detection dataset and successively fine-tuning domain-specific layers to be targetspecific in an online fashion. MDNET [38] shows the success of such a method by winning the VOT15 [24] challenge. A second approach consists of training a fully convolutional network and using a feature map selection method to choose between shallow and deep layers during tracking [47]. The goal is to find a good trade-off between general semantic and more specific discriminative features, as well as, to remove noisy and irrelevant feature maps.
Funding
  • ∗This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR)
Reference
  • Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1401–1409 (2016)
    Google ScholarLocate open access versionFindings
  • Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fullyconvolutional siamese networks for object tracking. In: European conference on computer vision. pp. 850–865. Springer (2016)
    Google ScholarLocate open access versionFindings
  • Bibi, A., Mueller, M., Ghanem, B.: Target response adaptation for correlation filter tracking. In: European conference on computer vision. pp. 419–433.
    Google ScholarLocate open access versionFindings
  • Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. pp. 2544–2550 (June 2010). https://doi.org/10.1109/CVPR.2010.5539960
    Findings
  • Collins, R., Zhou, X., Teh, S.K.: An open source tracking testbed and evaluation web site. In: IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS 2005), January 2005 (January 2005)
    Google ScholarLocate open access versionFindings
  • Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA. pp. 21–26 (2017)
    Google ScholarLocate open access versionFindings
  • Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 4310–4318 (2015)
    Google ScholarLocate open access versionFindings
  • Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: The IEEE International Conference on Computer Vision (ICCV) (Dec 2015)
    Google ScholarLocate open access versionFindings
  • Danelljan, M., Hger, G., Shahbaz Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference. BMVA Press (2014). https://doi.org/http://dx.doi.org/10.5244/C.28.65
    Locate open access versionFindings
  • Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: ECCV (2016)
    Google ScholarFindings
  • Galoogahi, H.K., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: A benchmark for higher frame rate object tracking. arXiv preprint arXiv:1703.05884 (2017)
    Findings
  • Galoogahi, H.K., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA. pp. 21–26 (2017)
    Google ScholarLocate open access versionFindings
  • Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic siamese network for visual object tracking. In: The IEEE International Conference on Computer Vision (ICCV) (Oct 2017)
    Google ScholarLocate open access versionFindings
  • Hare, S., Saffari, A., Torr, P.H.S.: Struck: Structured output tracking with kernels. In: 2011 International Conference on Computer Vision. pp. 263–270. IEEE (Nov 2011). https://doi.org/10.1109/ICCV.2011.6126251
    Locate open access versionFindings
  • Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: European Conference Computer Vision (ECCV) (2016)
    Google ScholarLocate open access versionFindings
  • Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. Pattern Analysis and Machine Intelligence, IEEE Transactions on (2015). https://doi.org/10.1109/TPAMI.2014.2345390
    Locate open access versionFindings
  • Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision. pp. 702–715. Springer (2012)
    Google ScholarLocate open access versionFindings
  • Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(3), 583–596 (2015)
    Google ScholarLocate open access versionFindings
  • Henriques, J., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision ECCV 2012, Lecture Notes in Computer Science, vol. 7575, pp. 702–715. Springer Berlin Heidelberg (2012)
    Google ScholarLocate open access versionFindings
  • Jia, X., Lu, H., Yang, M.H.: Visual tracking via adaptive structural local sparse appearance model. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. pp. 1822–1829 (June 2012). https://doi.org/10.1109/CVPR.2012.6247880
    Locate open access versionFindings
  • Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-Learning-Detection. IEEE transactions on pattern analysis and machine intelligence 34(7), 1409–1422 (Dec 2011). https://doi.org/10.1109/TPAMI.2011.239
    Findings
  • Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin, L., Vojir, T., Hager, G., Lukezic, A., Fernandez, G.: The visual object tracking vot2016 challenge results. Springer (Oct 2016), http://www.springer.com/gp/book/9783319488806
    Findings
  • Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., Vojir, T., Hager, G., Lukezic, A., Eldesokey, A., Fernandez, G.: The visual object tracking vot2017 challenge results (2017), http://openaccess.thecvf.com/content ICCV 2017 workshops/papers/w28/ Kristan The Visual Object ICCV 2017 paper.pdf
    Locate open access versionFindings
  • Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernandez, G., Vojir, T., Hager, G., Nebehay, G., Pflugfelder, R.: The visual object tracking vot2015 challenge results. In: Visual Object Tracking Workshop 2015 at ICCV2015 (Dec 2015)
    Google ScholarLocate open access versionFindings
  • Kristan, M., Matas, J., Leonardis, A., Vojir, T., Pflugfelder, R., Fernandez, G., Nebehay, G., Porikli, F., Cehovin, L.: A novel performance evaluation methodology for single-target trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(11), 2137–2155 (Nov 2016). https://doi.org/10.1109/TPAMI.2016.2516982
    Locate open access versionFindings
  • Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Porikli, F., Cehovin, L., Nebehay, G., Fernandez, G., Vojir, T., Gatt, A., et al.: The visual object tracking vot2013 challenge results. In: Computer Vision Workshops (ICCVW), 2013 IEEE International Conference on. pp. 98–111. IEEE (2013)
    Google ScholarLocate open access versionFindings
  • Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Cehovin, L., Nebehay, G., Vojir, T., Fernandez, G., Lukezic, A.: The visual object tracking vot2014 challenge results (2014), http://www.votchallenge.net/vot2014/program.html
    Findings
  • Leal-Taixe, L., Milan, A., Reid, I., Roth, S., Schindler, K.: Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942 (2015)
    Findings
  • Li, A., Lin, M., Wu, Y., Yang, M.H., Yan, S.: Nus-pro: A new visual tracking challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(2), 335–349 (Feb 2016). https://doi.org/10.1109/TPAMI.2015.2417577
    Locate open access versionFindings
  • Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A., Hengel, A.V.D.: A survey of appearance models in visual object tracking. ACM transactions on Intelligent Systems and Technology (TIST) 4(4), 58 (2013)
    Google ScholarLocate open access versionFindings
  • Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision. pp. 254–265. Springer (2014)
    Google ScholarLocate open access versionFindings
  • Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) Computer Vision - ECCV 2014 Workshops. pp. 254–265. Springer International Publishing, Cham (2015)
    Google ScholarLocate open access versionFindings
  • Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: Algorithms and benchmark. Image Processing, IEEE... pp. 1–14 (2015), http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=7277070
    Locate open access versionFindings
  • Lukezic, A., Vojır, T., Zajc, L.C., Matas, J., Kristan, M.: Discriminative correlation filter with channel and spatial reliability. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. vol. 2 (2017)
    Google ScholarLocate open access versionFindings
  • Milan, A., Leal-Taixe, L., Reid, I., Roth, S., Schindler, K.: MOT16: A benchmark for multi-object tracking. arXiv:1603.00831 [cs] (Mar 2016), http://arxiv.org/abs/1603.00831, arXiv:1603.00831
    Findings
  • Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: Proc. of the European Conference on Computer Vision (ECCV) (2016)
    Google ScholarLocate open access versionFindings
  • Mueller, M., Smith, N., Ghanem, B.: Context-aware correlation filter tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit.(CVPR). pp. 1396–1404 (2017)
    Google ScholarLocate open access versionFindings
  • Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
    Google ScholarLocate open access versionFindings
  • Ning, J., Yang, J., Jiang, S., Zhang, L., Yang, M.H.: Object tracking via dual linear structured svm and explicit feature map. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4266–4274 (2016)
    Google ScholarLocate open access versionFindings
  • Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtubeboundingboxes: A large high-precision human-annotated data set for object detection in video. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 7464–7473. IEEE (2017)
    Google ScholarLocate open access versionFindings
  • Ross, D., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. International Journal of Computer Vision 77(1-3), 125–141 (2008). https://doi.org/10.1007/s11263-007-0075-7
    Locate open access versionFindings
  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115(3), 211–252 (2015)
    Google ScholarLocate open access versionFindings
  • Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(7), 1442–1468 (July 2014). https://doi.org/10.1109/TPAMI.2013.230
    Locate open access versionFindings
  • Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: An experimental survey. IEEE transactions on pattern analysis and machine intelligence 36(7), 1442–1468 (2014)
    Google ScholarLocate open access versionFindings
  • Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. pp. 5000–5008. IEEE (2017)
    Google ScholarLocate open access versionFindings
  • Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. International Journal of Computer Vision 101(1), 184–204 (2013)
    Google ScholarLocate open access versionFindings
  • Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV). pp. 3119–3127 (Dec 2015). https://doi.org/10.1109/ICCV.2015.357
    Locate open access versionFindings
  • Wu, Y., Lim, J., Yang, M.H.: Online object tracking: A benchmark. In: Computer vision and pattern recognition (CVPR), 2013 IEEE Conference on. pp. 2411–2418. Ieee (2013)
    Google ScholarFindings
  • Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9), 1834–1848 (2015)
    Google ScholarLocate open access versionFindings
  • Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. Acm computing surveys (CSUR) 38(4), 13 (2006)
    Google ScholarLocate open access versionFindings
  • Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: Proc. of the European Conference on Computer Vision (ECCV) (2014)
    Google ScholarLocate open access versionFindings
Author
Adel Bibi
Adel Bibi
Salman Al-Subaihi
Salman Al-Subaihi
Your rating :
0

 

Tags
Comments
小科