AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We presented a novel visual tracking algorithm based on stochastic ensemble learning based on a convolutional neural networks with multiple branches

Branchout: Regularization For Online Ensemble Tracking With Convolutional Neural Networks

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), no. 1 (2017): 521-530

Cited: 144|Views62
EI

Abstract

We propose an extremely simple but effective regularization technique of convolutional neural networks (CNNs), referred to as BranchOut, for online ensemble tracking. Our algorithm employs a CNN for target representation, which has a common convolutional layers but has multiple branches of fully connected layers. For better regularization...More

Code:

Data:

0
Introduction
  • Visual tracking is valuable source of low-level information for high-level video understanding, so it has been applied to many computer vision tasks such as action recognition [6, 35], event detection [24], object detection from video [21], and so on.
  • It is extremely difficult to learn representative but adaptive features for robust tracking, especially in online scenarios.
  • The authors propose a novel visual tracking algorithm focusing on target appearance modeling, where the appearance is learned by a convolutional neural network (CNN) with multiple branches as shown in Figure 1.
  • The target state is estimated by an ensemble of all branches while online model update is performed by the standard error backpropagation.
  • The authors allow the individual branches to have different numbers of fully connected layers and maintain multilevel target representations
Highlights
  • Visual tracking is valuable source of low-level information for high-level video understanding, so it has been applied to many computer vision tasks such as action recognition [6, 35], event detection [24], object detection from video [21], and so on
  • We propose a novel visual tracking algorithm focusing on target appearance modeling, where the appearance is learned by a convolutional neural network (CNN) with multiple branches as shown in Figure 1
  • We propose a simple but effective regularization technique, BranchOut, which is well-suited for online ensemble tracking
  • Two evaluation metrics are employed in our experiment: bounding box overlap ratio and center location error in the one-pass evaluation (OPE) protocol
  • All methods except MUSTer, DSST and Spatially Regularized Discriminative Correlation Filters (SRDCF) are based on the features from convolutional neural networks
  • We presented a novel visual tracking algorithm based on stochastic ensemble learning based on a CNN with multiple branches
Methods
  • The authors show the performance of the BranchOut technique with ensemble tracking application on two standard public benchmarks—Object Tracking Benchmark (OTB100) [39] and VOT2015 [22], and compare the algorithm with the state-of-the-art trackers.
Results
  • Evaluation on OTB

    OTB100 [39] is a popular benchmark dataset, which contains 100 fully annotated videos with substantial variations and challenges.
  • The authors construct two subsets of OTB100 based on average accuracy of the 10 compared algorithms; the two subsets are composed of the sequences that have lower average bounding box overlap ratios than two predefined thresholds, 0.7 and 0.5.
  • These two subsets include 69 and 21 sequences, and can be regarded as hard and very hard examples.
  • TCNN and BranchOut demonstrate outstanding scores and ranks while the performance C-COT in VOT2015 dataset is surprisingly low; it is probably because the algorithm is overfitted to other datasets
Conclusion
  • The authors claim that BranchOut provides diverse models and effective regularization. Suppose that the model at time t1, denoted by Ft1 (xi; θk), evolves to Ft2 (xi; θk) at time t2.
  • After |t2 −t1| deterministic model updates with the same training datasets, all the branches with the same architecture are likely to converge to the almost same model since they are updated with the same data for a substantial amount of iterations.
  • The authors' ensemble tracking algorithm selects a random subset of branches for model update to diversify learned target appearance models.
  • This technique, referred to as BranchOut, is effective to regularize ensemble classifiers and improve tracking accuracy .
  • The proposed algorithm showed outstanding performance in the standard tracking benchmarks
Tables
  • Table1: Internal comparison results in OTB100 dataset. Among three ensemble learning options, our stochastic BranchOut technique outperforms Naıve ensemble and Greedy BranchOut methods. On the other hand, multi-level representations for BranchOut, denoted by Multi-5-5, is helpful to improve performance compared to single-level representations such as Single-0-10 and Single-10-0. Fonts in bold faces denote the best performance within the group of options
  • Table2: The accuracy of MDNet ensemble with BranchOut. Our ensemble tracking algorithm outperforms the original MDNet and its naıve ensemble results
  • Table3: The experimental results in VOT2015 dataset. The first and second best algorithms are highlighted in red and blue colors, respectively. The algorithms are sorted in a decreasing order based on expected overlap ratio
Download tables as Excel
Related work
  • Visual tracking has a long history and there are tremendously many papers published in the last few decades. However, due to space limitation, we will review several active classes of methodology only in this section.

    Tracking algorithms based on correlation filters are popular these days. This trend is mainly attributed to their great performance in terms of accuracy and efficiency. Bolme et al [3] have introduced a minimum output sum of squared error (MOSSE) filter for visual tracking. Kernelized correlation filters (KCF) using circulant matrices [15] are employed to handle multi-channel features in Fourier domain. DSST [7] decouples the filters for translation and scaling to achieve accurate scale estimation, and MUSTer [17], motivated by a psychological memory model, utilizes shortand long-term memory stores for robust appearance modeling. Tracking algorithms relying on correlation filters often suffer from boundary effects. To alleviate this issue, [11] proposes the Alternating Direction Method of Multipliers (ADMM) technique, and Spatially Regularized Discriminative Correlation Filters (SRDCF) [9] introduces a spatial regularization term.
Funding
  • This work was BranchOut TCNN [29] DeepSRDCF [8] EBT [40] C-COT [10] SiamFC-3s [2] SRDCF [9] LDP [22] sPST [18] NSAMF [27] Struck [14] RAJSSC [22] Accuracy Robustness Expected Overlap 0.3384 0.3404 0.3181 0.3130 0.3034 0.2915 0.2877 0.2785 0.2767 0.2536 0.2458 0.2420 partly supported by the ICT R&D program of MSIP/IITP [2014-0-00147, Machine Learning Center; 2014-0-00059, DeepView; 2016-0-00563, Research on Adaptive Machine Learning Technology Development for Intelligent Autonomous Digital Companion]
Study subjects and analysis
samples: 256
21: until end of sequence. When searching for target in each frame, we draw N = 256 samples for observation. We enlarge search space wildly if classification scores from CNN are below the predefined threshold for more than 10 frames in a row

Reference
  • B. Babenko, M.-H. Yang, and S. Belongie. Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1619–1632, 2012
    Google ScholarLocate open access versionFindings
  • L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr. Fully-convolutional siamese networks for object tracking. In arXiv:1606.09549, 2016. 8
    Findings
  • D. S. Bolme, J. R. Beveridge, B. Draper, and Y. M. Lui. Visual object tracking using adaptive correlation filters. In CVPR, 2010. 2
    Google ScholarFindings
  • L. Breiman. Random forests. Machine Learning, 45(1):5– 32, 2001. 2, 3
    Google ScholarLocate open access versionFindings
  • K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2014. 4
    Google ScholarLocate open access versionFindings
  • W. Choi and S. Savarese. A unified framework for multitarget tracking and collective activity recognition. In ECCV, 2012. 1
    Google ScholarLocate open access versionFindings
  • M. Danelljan, G. Hager, F. Khan, and M. Felsberg. Accurate scale estimation for robust visual tracking. In BMVC, 2014. 2, 5
    Google ScholarLocate open access versionFindings
  • M. Danelljan, G. Hager, F. Khan, and M. Felsberg. Convolutional features for correlation filter based visual tracking. In ICCVW, 2015. 2, 5, 8
    Google ScholarLocate open access versionFindings
  • M. Danelljan, G. Hager, F. Khan, and M. Felsberg. Learning spatially regularized correlation filters for visual tracking. In ICCV, 2015. 2, 5, 7, 8
    Google ScholarLocate open access versionFindings
  • M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In ECCV, 2016. 2, 5, 7, 8
    Google ScholarLocate open access versionFindings
  • H. K. Galoogahi, T. Sim, and S. Lucey. Correlation filters with limited boundaries. In CVPR, 2015. 2
    Google ScholarLocate open access versionFindings
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014. 4
    Google ScholarLocate open access versionFindings
  • H. Grabner, M. Grabner, and H. Bischof. Real-time tracking via on-line boosting. In BMVC, 2006. 2
    Google ScholarLocate open access versionFindings
  • S. Hare, A. Saffari, and P. H. Torr. Struck: Structured output tracking with kernels. In ICCV, 2011. 2, 7, 8
    Google ScholarLocate open access versionFindings
  • J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Highspeed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583–596, 202
    Google ScholarLocate open access versionFindings
  • S. Hong, T. You, S. Kwak, and B. Han. Online tracking by learning discriminative saliency map with convolutional neural network. In ICML, 2015. 2, 5
    Google ScholarLocate open access versionFindings
  • Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. MUlti-Store Tracker (MUSTer): a cognitive psychology inspired approach to object tracking. In CVPR, 2015. 2, 5
    Google ScholarLocate open access versionFindings
  • Y. Hua, K. Alahari, and C. Schmid. Online object tracking with proposal selection. In ICCV, 2015. 7, 8
    Google ScholarLocate open access versionFindings
  • G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Weinberger. Deep networks with stochastic depth. In ECCV, 2016. 3
    Google ScholarLocate open access versionFindings
  • Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learningdetection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1409–1422, 2012. 2
    Google ScholarLocate open access versionFindings
  • K. Kang, W. Ouyang, H. Li, and X. Wang. Object detection from video tubelets with convolutional neural networks. In CVPR, 2016. 1
    Google ScholarLocate open access versionFindings
  • M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin, G. Fernandez, T. Vojir, G. Hager, G. Nebehay, R. Pflugfelder, et al. The visual object tracking VOT2015 challenge results. In ICCVW, pages 564–586, 2015. 5, 7, 8
    Google ScholarLocate open access versionFindings
  • M. Kristan, J. Matas, A. Leonardis, T. Vojir, R. Pflugfelder, G. Fernandez, G. Nebehay, F. Porikli, and L. Cehovin. A novel performance evaluation methodology for single-target trackers. In IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 38, pages 2137–2155, 2016.
    Google ScholarLocate open access versionFindings
  • S. Kwak, B. Han, and J. H. Han. Multi-agent event detection: Localization and role assignment. In CVPR, 2013. 1
    Google ScholarLocate open access versionFindings
  • S. Lee, S. Purushwalkam, M. Cogswell, V. Ranjan, D. Crandall, and D. Batra. Stochastic multiple choice learning for training diverse deep ensembles. In NIPS, 2016. 3, 6
    Google ScholarLocate open access versionFindings
  • H. Li, Y. Li, and F. Porikli. Convolutional neural net bagging for online visual tracking. Computer Vision and Image Understanding, 153:120–129, 2016. 2
    Google ScholarLocate open access versionFindings
  • Y. Li and J. Zhu. A scale adaptive kernel correlation filter tracker with feature integration. In ECCVW, 2014. 7, 8
    Google ScholarLocate open access versionFindings
  • C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang. Hierarchical convolutional features for visual tracking. In ICCV, 2015. 2, 5
    Google ScholarLocate open access versionFindings
  • H. Nam, M. Baek, and B. Han. Modeling and propagating cnns in a tree structure for visual tracking. In arXiv:1608.07242, 2016. 2, 5, 8
    Findings
  • H. Nam and B. Han. Learning multi-domain convolutional neural networks for visual tracking. In CVPR, 2016. 2, 4, 5, 7
    Google ScholarLocate open access versionFindings
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015. 2, 4
    Google ScholarLocate open access versionFindings
  • N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958, 2014. 1, 3
    Google ScholarLocate open access versionFindings
  • J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. Efficient object localization using convolutional networks. In CVPR, 2015. 3
    Google ScholarLocate open access versionFindings
  • L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. Regularization of neural network using dropconnect. In ICML, 2013. 1, 3
    Google ScholarFindings
  • H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, 2011. 1
    Google ScholarLocate open access versionFindings
  • L. Wang, W. Ouyang, X. Wang, and H. Lu. Visual tracking with fully convolutional networks. In ICCV, 2015. 2, 5
    Google ScholarLocate open access versionFindings
  • L. Wang, W. Ouyang, X. Wang, and H. Lu. STCT: sequentially training convolutional networks for visual tracking. In CVPR, 2016. 2, 3
    Google ScholarLocate open access versionFindings
  • N. Wang and D.-Y. Yeung. Ensemble-based tracking: Aggregating crowdsourced structured time series data. In ICML, 2014. 7
    Google ScholarLocate open access versionFindings
  • Y. Wu, J. Lim, and M. Yang. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, 2015. 5
    Google ScholarLocate open access versionFindings
  • G. Zhu, F. Porikli, and H. Li. Tracking randomly moving objects on edge box proposals. arXiv:1507.08085, 2015. 7, 8
    Findings
0
Your rating :

No Ratings

Tags
Comments
avatar
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn