We proposed a novel tracking algorithm based on Convolutional Neural Network trained in a multi-domain learning framework, which is re-
Learning Multi-Domain Convolutional Neural Networks for Visual Tracking
We propose a novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network (CNN). Our algorithm pretrains a CNN using a large set of videos with tracking groundtruths to obtain a generic target representation. Our network is composed of shared layers and multiple branches of doma...更多
下载 PDF 全文
- Convolutional Neural Networks (CNNs) have recently been applied to various computer vision tasks such as image classification [4, 28, 37], semantic segmentation [19, 31, 33], object detection , and many others [34, 40, 41].
- Training CNNs is even more difficult since the same kind of objects can be considered as a target in a sequence and as a background object in another
- Due to such variations and inconsistencies across sequences, the authors believe that the ordinary learning methods based on the standard classification task are not appropriate, and another approach to capture sequence-independent information should be incorporated for better representation
- Convolutional Neural Networks (CNNs) have recently been applied to various computer vision tasks such as image classification [4, 28, 37], semantic segmentation [19, 31, 33], object detection , and many others [34, 40, 41]
- To fully exploit the representation power of CNNs in visual tracking, it is desirable to train them on large-scale data specialized for visual tracking, which cover a wide range of variations in the combination of target and background
- We propose a novel CNN architecture, referred to as Multi-Domain Network (MDNet), to learn the shared representation of targets from multiple annotated video sequences for tracking, where each video is regarded as a separate domain
- We evaluated MDNet on two datasets, Object Tracking Benchmark (OTB)  and VOT2014 
- The one-pass evaluation (OPE) is employed to compare our algorithm with the six state-ofthe-art trackers including MUSTer , CNN-SVM , MEEM , TGPR , DSST  and kernelized correlation filters (KCF) , as well as the top 2 trackers included in the benchmark—SCM 
- We proposed a novel tracking algorithm based on CNN trained in a multi-domain learning framework, which is re
- Evaluation on OTB
OTB  is a popular tracking benchmark that contains 100 fully annotated videos with substantial variations.
- The one-pass evaluation (OPE) is employed to compare the algorithm with the six state-ofthe-art trackers including MUSTer , CNN-SVM , MEEM , TGPR , DSST  and KCF , as well as the top 2 trackers included in the benchmark—SCM .
- Success plots of OPE − in−plane rotation (51) 1 MDNet [0.675] CNN−SVM [0.546] MEEM [0.538] MUSTer [0.531] TGPR [0.468] Struck [0.461].
- The authors proposed a novel tracking algorithm based on CNN trained in a multi-domain learning framework, which is re-
Ranking plot for label camera_motion 1
Ranking plot for label illum_change 1 Accuracy rank
Ranking plot for label motion_change 1
Ranking plot for label occlusion 1
8 87654321 Robustness rank
Ranking plot for label size_change 1
Ranking plot for label empty 1
ferred to as MDNet.
- The authors proposed a novel tracking algorithm based on CNN trained in a multi-domain learning framework, which is re-.
- Ranking plot for label camera_motion 1.
- Ranking plot for label illum_change 1 Accuracy rank.
- Ranking plot for label motion_change 1.
- Ranking plot for label size_change 1.
- Ferred to as MDNet. The authors' tracking algorithm learns domainindependent representations from pretraining, and captures domain-specific information through online learning during tracking.
- The entire network is pretrained offline, and the fully connected layers including a single domain-specific layer are fine-tuned online.
- The authors achieved outstanding performance in two large public tracking benchmarks, OTB and VOT2014, compared to the state-of-the-art tracking algorithms
- Table1: The average scores and ranks of accuracy and robustness on the two experiments in VOT2014 [<a class="ref-link" id="c26" href="#r26">26</a>]. The first and second best scores are highlighted in red and blue colors, respectively
- 2.1. Visual Tracking Algorithms
Visual tracking is one of the fundamental problems in computer vision and has been actively studied for decades. Most tracking algorithms fall into either generative or discriminative approaches. Generative methods describe the target appearances using generative models and search for the target regions that fit the models best. Various generative target appearance modeling algorithms have been proposed including sparse representation [32, 48], density estimation [15, 22], and incremental subspace learning . In contrast, discriminate methods aim to build a model that distinguishes the target object from the background. These tracking algorithms typically learn classifiers based on multiple instance learning , P-N learning , online boosting [13, 14, 38], structured output SVMs , etc.
- This work was partly supported by IITP grant (B010116-0307; Machine Learning Center, B0101-16-0552; DeepView) funded by the Korean government (MSIP)
negative samples: 200
1.05si to the initial target scale. Training data For offline multi-domain learning, we collect 50 positive and 200 negative samples from every frame, where positive and negative examples have ≥ 0.7 and ≤ 0.5. IoU overlap ratios with ground-truth bounding boxes, respectively
- B. Babenko, M.-H. Yang, and S. Belongie. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell., 33(8):1619–1632, 2012
- D. S. Bolme, J. R. Beveridge, B. Draper, Y. M. Lui, et al. Visual object tracking using adaptive correlation filters. In CVPR, 2010. 2
- Z. Cai, L. Wen, J. Yang, Z. Lei, and S. Z. Li. Structured visual tracking with dynamic graph. In ACCV, 2012. 8
- K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2011, 3
- M. Danelljan, G. Hager, F. Khan, and M. Felsberg. Accurate scale estimation for robust visual tracking. In BMVC, 2014. 1, 2, 5, 8
- H. Daume III. Frustratingly easy domain adaptation. In ACL, 2007. 3
- M. Dredze, A. Kulesza, and K. Crammer. Multi-domain learning by confidence-weighted parameter combination. Machine Learning, 79(1-2):123–149, 2010. 3
- L. Duan, I. W. Tsang, D. Xu, and T.-S. Chua. Domain adaptation from multiple sources via auxiliary classifiers. In ICML, 2009. 3
- J. Fan, W. Xu, Y. Wu, and Y. Gong. Human tracking using convolutional neural networks. IEEE Trans. Neural Networks, 21(10):1610–1623, 2010. 2
- P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. IEEE Trans. Pattern Anal. Mach. Intell., 32(9):1627–1645, 204
- J. Gao, H. Ling, W. Hu, and J. Xing. Transfer learning based visual tracking with gaussian processes regression. In ECCV, 2014. 5
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014. 1, 2, 4, 5
- H. Grabner, M. Grabner, and H. Bischof. Real-time tracking via on-line boosting. In BMVC, 2006. 2
- H. Grabner, C. Leistner, and H. Bischof. Semi-supervised on-line boosting for robust tracking. In ECCV, 2008. 2
- B. Han, D. Comaniciu, Y. Zhu, and L. Davis. Sequential kernel density approximation and its application to real-time visual tracking. IEEE Trans. Pattern Anal. Mach. Intell., 30(7):1186–1197, 2008. 2
- S. Hare, A. Saffari, and P. H. Torr. Struck: Structured output tracking with kernels. In ICCV, 2011. 2, 6
- J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Highspeed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell., 37(3):583–596, 2015. 1, 2, 5, 8
- J. Hoffman, B. Kulis, T. Darrell, and K. Saenko. Discovering latent domains for multisource domain adaptation. In ECCV, 2012. 3
- S. Hong, H. Noh, and B. Han. Decoupled deep neural network for semi-supervised semantic segmentation. In NIPS, 2015. 1
- S. Hong, T. You, S. Kwak, and B. Han. Online tracking by learning discriminative saliency map with convolutional neural network. In ICML, 2015. 1, 2, 3, 5
- Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. MUlti-Store Tracker (MUSTer): a cognitive psychology inspired approach to object tracking. In CVPR, 2015. 1, 2, 5, 8
- A. D. Jepson, D. J. Fleet, and T. F. El-Maraghi. Robust online appearance models for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell., 25(10):1296–1311, 2003. 2
- M. Joshi, W. W. Cohen, M. Dredze, and C. P. Rose. Multidomain learning: when do domains matter? In EMNLPCoNLL, 2012. 3
- Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learningdetection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1409–1422, 2012. 2
- M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin, G. Fernandez, T. Vojir, G. Hager, G. Nebehay, R. Pflugfelder, et al. The visual object tracking VOT2015 challenge results. In ICCVW, 2015. 6
- M. Kristan, R. Pflugfelder, A. Leonardis, J. Matas, L. Cehovin, G. Nebehay, T. Vojır, G. Fernandez, et al. The visual object tracking VOT2014 challenge results. In ECCVW, 2014. 2, 5, 6, 7, 8
- M. Kristan, R. Pflugfelder, A. Leonardis, J. Matas, F. Porikli, L. Cehovin, G. Nebehay, G. Fernandez, T. Vojir, et al. The visual object tracking VOT2013 challenge results. In ICCVW, 2013. 6
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 2, 3
- H. Li, Y. Li, and F. Porikli. DeepTrack: Learning discriminative feature representations by convolutional neural networks for visual tracking. In BMVC, 2014. 2
- Y. Li and J. Zhu. A scale adaptive kernel correlation filter tracker with feature integration. In ECCVW, 2014. 8
- J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015. 1, 2
- X. Mei and H. Ling. Robust visual tracking using l1 minimization. In ICCV, 2009. 2
- H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. In ICCV, 2015. 1
- H. Noh, P. H. Seo, and B. Han. Image question answering using convolutional neural network with dynamic parameter prediction. In CVPR, 2016. 1, 2
- D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental learning for robust visual tracking. IJCV, 77(1-3):125–141, 2008. 2
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. IJCV, pages 1–42, 2015. 1
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 1, 2, 3
- J. Son, I. Jung, K. Park, and B. Han. Tracking-bysegmentation with online gradient boosting decision tree. In ICCV, 2015. 2
- K.-K. Sung and T. Poggio. Example-based learning for viewbased human face detection. IEEE Trans. Pattern Anal. Mach. Intell., 20(1):39–51, 1998. 4
- Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014. 1, 2
- A. Toshev and C. Szegedy. Deeppose: Human pose estimation via deep neural networks. In CVPR, 2014. 1, 2
- A. Vedaldi and K. Lenc. Matconvnet – convolutional neural networks for matlab. In ACM MM, 2015. 5
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015. 2
- N. Wang, S. Li, A. Gupta, and D.-Y. Yeung. Transferring rich feature hierarchies for robust visual tracking. arXiv preprint arXiv:1501.04587, 2015. 1, 2
- Y. Wu, J. Lim, and M. Yang. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, Sept 2015. 2, 5, 6
- Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A benchmark. In CVPR, 2013. 5, 6
- J. Zhang, S. Ma, and S. Sclaroff. MEEM: Robust tracking via multiple experts using entropy minimization. In ECCV, 2014. 1, 5, 8
- T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Robust visual tracking via multi-task sparse learning. In CVPR, 2012. 2
- W. Zhong, H. Lu, and M.-H. Yang. Robust object tracking via sparsity-based collaborative model. In CVPR, 2012. 5