AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We proposed a novel tracking algorithm based on Convolutional Neural Network trained in a multi-domain learning framework, which is re-

Learning Multi-Domain Convolutional Neural Networks for Visual Tracking

CVPR, (2016)

被引用2017|浏览195
EI
下载 PDF 全文
引用
微博一下

摘要

We propose a novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network (CNN). Our algorithm pretrains a CNN using a large set of videos with tracking groundtruths to obtain a generic target representation. Our network is composed of shared layers and multiple branches of doma...更多

代码

数据

0
简介
  • Convolutional Neural Networks (CNNs) have recently been applied to various computer vision tasks such as image classification [4, 28, 37], semantic segmentation [19, 31, 33], object detection [12], and many others [34, 40, 41].
  • Training CNNs is even more difficult since the same kind of objects can be considered as a target in a sequence and as a background object in another
  • Due to such variations and inconsistencies across sequences, the authors believe that the ordinary learning methods based on the standard classification task are not appropriate, and another approach to capture sequence-independent information should be incorporated for better representation
重点内容
  • Convolutional Neural Networks (CNNs) have recently been applied to various computer vision tasks such as image classification [4, 28, 37], semantic segmentation [19, 31, 33], object detection [12], and many others [34, 40, 41]
  • To fully exploit the representation power of CNNs in visual tracking, it is desirable to train them on large-scale data specialized for visual tracking, which cover a wide range of variations in the combination of target and background
  • We propose a novel CNN architecture, referred to as Multi-Domain Network (MDNet), to learn the shared representation of targets from multiple annotated video sequences for tracking, where each video is regarded as a separate domain
  • We evaluated MDNet on two datasets, Object Tracking Benchmark (OTB) [45] and VOT2014 [26]
  • The one-pass evaluation (OPE) is employed to compare our algorithm with the six state-ofthe-art trackers including MUSTer [21], CNN-SVM [20], MEEM [47], TGPR [11], DSST [5] and kernelized correlation filters (KCF) [17], as well as the top 2 trackers included in the benchmark—SCM [49]
  • We proposed a novel tracking algorithm based on CNN trained in a multi-domain learning framework, which is re
结果
  • Evaluation on OTB

    OTB [45] is a popular tracking benchmark that contains 100 fully annotated videos with substantial variations.
  • The one-pass evaluation (OPE) is employed to compare the algorithm with the six state-ofthe-art trackers including MUSTer [21], CNN-SVM [20], MEEM [47], TGPR [11], DSST [5] and KCF [17], as well as the top 2 trackers included in the benchmark—SCM [49].
  • Success plots of OPE − in−plane rotation (51) 1 MDNet [0.675] CNN−SVM [0.546] MEEM [0.538] MUSTer [0.531] TGPR [0.468] Struck [0.461].
结论
  • The authors proposed a novel tracking algorithm based on CNN trained in a multi-domain learning framework, which is re-

    Ranking plot for label camera_motion 1

    Ranking plot for label illum_change 1 Accuracy rank

    Ranking plot for label motion_change 1

    Ranking plot for label occlusion 1

    8 87654321 Robustness rank

    Ranking plot for label size_change 1

    Ranking plot for label empty 1

    ferred to as MDNet.
  • The authors proposed a novel tracking algorithm based on CNN trained in a multi-domain learning framework, which is re-.
  • Ranking plot for label camera_motion 1.
  • Ranking plot for label illum_change 1 Accuracy rank.
  • Ranking plot for label motion_change 1.
  • Ranking plot for label size_change 1.
  • Ferred to as MDNet. The authors' tracking algorithm learns domainindependent representations from pretraining, and captures domain-specific information through online learning during tracking.
  • The entire network is pretrained offline, and the fully connected layers including a single domain-specific layer are fine-tuned online.
  • The authors achieved outstanding performance in two large public tracking benchmarks, OTB and VOT2014, compared to the state-of-the-art tracking algorithms
表格
  • Table1: The average scores and ranks of accuracy and robustness on the two experiments in VOT2014 [<a class="ref-link" id="c26" href="#r26">26</a>]. The first and second best scores are highlighted in red and blue colors, respectively
Download tables as Excel
相关工作
  • 2.1. Visual Tracking Algorithms

    Visual tracking is one of the fundamental problems in computer vision and has been actively studied for decades. Most tracking algorithms fall into either generative or discriminative approaches. Generative methods describe the target appearances using generative models and search for the target regions that fit the models best. Various generative target appearance modeling algorithms have been proposed including sparse representation [32, 48], density estimation [15, 22], and incremental subspace learning [35]. In contrast, discriminate methods aim to build a model that distinguishes the target object from the background. These tracking algorithms typically learn classifiers based on multiple instance learning [1], P-N learning [24], online boosting [13, 14, 38], structured output SVMs [16], etc.
基金
  • This work was partly supported by IITP grant (B010116-0307; Machine Learning Center, B0101-16-0552; DeepView) funded by the Korean government (MSIP)
研究对象与分析
negative samples: 200
1.05si to the initial target scale. Training data For offline multi-domain learning, we collect 50 positive and 200 negative samples from every frame, where positive and negative examples have ≥ 0.7 and ≤ 0.5. IoU overlap ratios with ground-truth bounding boxes, respectively

引用论文
  • B. Babenko, M.-H. Yang, and S. Belongie. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell., 33(8):1619–1632, 2012
    Google ScholarLocate open access versionFindings
  • D. S. Bolme, J. R. Beveridge, B. Draper, Y. M. Lui, et al. Visual object tracking using adaptive correlation filters. In CVPR, 2010. 2
    Google ScholarFindings
  • Z. Cai, L. Wen, J. Yang, Z. Lei, and S. Z. Li. Structured visual tracking with dynamic graph. In ACCV, 2012. 8
    Google ScholarLocate open access versionFindings
  • K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2011, 3
    Google ScholarLocate open access versionFindings
  • M. Danelljan, G. Hager, F. Khan, and M. Felsberg. Accurate scale estimation for robust visual tracking. In BMVC, 2014. 1, 2, 5, 8
    Google ScholarLocate open access versionFindings
  • H. Daume III. Frustratingly easy domain adaptation. In ACL, 2007. 3
    Google ScholarLocate open access versionFindings
  • M. Dredze, A. Kulesza, and K. Crammer. Multi-domain learning by confidence-weighted parameter combination. Machine Learning, 79(1-2):123–149, 2010. 3
    Google ScholarLocate open access versionFindings
  • L. Duan, I. W. Tsang, D. Xu, and T.-S. Chua. Domain adaptation from multiple sources via auxiliary classifiers. In ICML, 2009. 3
    Google ScholarLocate open access versionFindings
  • J. Fan, W. Xu, Y. Wu, and Y. Gong. Human tracking using convolutional neural networks. IEEE Trans. Neural Networks, 21(10):1610–1623, 2010. 2
    Google ScholarLocate open access versionFindings
  • P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. IEEE Trans. Pattern Anal. Mach. Intell., 32(9):1627–1645, 204
    Google ScholarLocate open access versionFindings
  • J. Gao, H. Ling, W. Hu, and J. Xing. Transfer learning based visual tracking with gaussian processes regression. In ECCV, 2014. 5
    Google ScholarLocate open access versionFindings
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014. 1, 2, 4, 5
    Google ScholarLocate open access versionFindings
  • H. Grabner, M. Grabner, and H. Bischof. Real-time tracking via on-line boosting. In BMVC, 2006. 2
    Google ScholarLocate open access versionFindings
  • H. Grabner, C. Leistner, and H. Bischof. Semi-supervised on-line boosting for robust tracking. In ECCV, 2008. 2
    Google ScholarLocate open access versionFindings
  • B. Han, D. Comaniciu, Y. Zhu, and L. Davis. Sequential kernel density approximation and its application to real-time visual tracking. IEEE Trans. Pattern Anal. Mach. Intell., 30(7):1186–1197, 2008. 2
    Google ScholarLocate open access versionFindings
  • S. Hare, A. Saffari, and P. H. Torr. Struck: Structured output tracking with kernels. In ICCV, 2011. 2, 6
    Google ScholarLocate open access versionFindings
  • J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Highspeed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell., 37(3):583–596, 2015. 1, 2, 5, 8
    Google ScholarLocate open access versionFindings
  • J. Hoffman, B. Kulis, T. Darrell, and K. Saenko. Discovering latent domains for multisource domain adaptation. In ECCV, 2012. 3
    Google ScholarLocate open access versionFindings
  • S. Hong, H. Noh, and B. Han. Decoupled deep neural network for semi-supervised semantic segmentation. In NIPS, 2015. 1
    Google ScholarLocate open access versionFindings
  • S. Hong, T. You, S. Kwak, and B. Han. Online tracking by learning discriminative saliency map with convolutional neural network. In ICML, 2015. 1, 2, 3, 5
    Google ScholarLocate open access versionFindings
  • Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. MUlti-Store Tracker (MUSTer): a cognitive psychology inspired approach to object tracking. In CVPR, 2015. 1, 2, 5, 8
    Google ScholarLocate open access versionFindings
  • A. D. Jepson, D. J. Fleet, and T. F. El-Maraghi. Robust online appearance models for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell., 25(10):1296–1311, 2003. 2
    Google ScholarLocate open access versionFindings
  • M. Joshi, W. W. Cohen, M. Dredze, and C. P. Rose. Multidomain learning: when do domains matter? In EMNLPCoNLL, 2012. 3
    Google ScholarLocate open access versionFindings
  • Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learningdetection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1409–1422, 2012. 2
    Google ScholarLocate open access versionFindings
  • M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin, G. Fernandez, T. Vojir, G. Hager, G. Nebehay, R. Pflugfelder, et al. The visual object tracking VOT2015 challenge results. In ICCVW, 2015. 6
    Google ScholarLocate open access versionFindings
  • M. Kristan, R. Pflugfelder, A. Leonardis, J. Matas, L. Cehovin, G. Nebehay, T. Vojır, G. Fernandez, et al. The visual object tracking VOT2014 challenge results. In ECCVW, 2014. 2, 5, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • M. Kristan, R. Pflugfelder, A. Leonardis, J. Matas, F. Porikli, L. Cehovin, G. Nebehay, G. Fernandez, T. Vojir, et al. The visual object tracking VOT2013 challenge results. In ICCVW, 2013. 6
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • H. Li, Y. Li, and F. Porikli. DeepTrack: Learning discriminative feature representations by convolutional neural networks for visual tracking. In BMVC, 2014. 2
    Google ScholarFindings
  • Y. Li and J. Zhu. A scale adaptive kernel correlation filter tracker with feature integration. In ECCVW, 2014. 8
    Google ScholarLocate open access versionFindings
  • J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015. 1, 2
    Google ScholarLocate open access versionFindings
  • X. Mei and H. Ling. Robust visual tracking using l1 minimization. In ICCV, 2009. 2
    Google ScholarLocate open access versionFindings
  • H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. In ICCV, 2015. 1
    Google ScholarLocate open access versionFindings
  • H. Noh, P. H. Seo, and B. Han. Image question answering using convolutional neural network with dynamic parameter prediction. In CVPR, 2016. 1, 2
    Google ScholarLocate open access versionFindings
  • D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental learning for robust visual tracking. IJCV, 77(1-3):125–141, 2008. 2
    Google ScholarLocate open access versionFindings
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. IJCV, pages 1–42, 2015. 1
    Google ScholarLocate open access versionFindings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • J. Son, I. Jung, K. Park, and B. Han. Tracking-bysegmentation with online gradient boosting decision tree. In ICCV, 2015. 2
    Google ScholarLocate open access versionFindings
  • K.-K. Sung and T. Poggio. Example-based learning for viewbased human face detection. IEEE Trans. Pattern Anal. Mach. Intell., 20(1):39–51, 1998. 4
    Google ScholarLocate open access versionFindings
  • Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • A. Toshev and C. Szegedy. Deeppose: Human pose estimation via deep neural networks. In CVPR, 2014. 1, 2
    Google ScholarLocate open access versionFindings
  • A. Vedaldi and K. Lenc. Matconvnet – convolutional neural networks for matlab. In ACM MM, 2015. 5
    Google ScholarLocate open access versionFindings
  • O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015. 2
    Google ScholarLocate open access versionFindings
  • N. Wang, S. Li, A. Gupta, and D.-Y. Yeung. Transferring rich feature hierarchies for robust visual tracking. arXiv preprint arXiv:1501.04587, 2015. 1, 2
    Findings
  • Y. Wu, J. Lim, and M. Yang. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, Sept 2015. 2, 5, 6
    Google ScholarLocate open access versionFindings
  • Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A benchmark. In CVPR, 2013. 5, 6
    Google ScholarLocate open access versionFindings
  • J. Zhang, S. Ma, and S. Sclaroff. MEEM: Robust tracking via multiple experts using entropy minimization. In ECCV, 2014. 1, 5, 8
    Google ScholarLocate open access versionFindings
  • T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Robust visual tracking via multi-task sparse learning. In CVPR, 2012. 2
    Google ScholarLocate open access versionFindings
  • W. Zhong, H. Lu, and M.-H. Yang. Robust object tracking via sparsity-based collaborative model. In CVPR, 2012. 5
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科