AI helps you reading Science
AI Insight
AI extracts a summary of this paper
Weibo:
Branchout: Regularization For Online Ensemble Tracking With Convolutional Neural Networks
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), no. 1 (2017): 521-530
EI
Keywords
Abstract
We propose an extremely simple but effective regularization technique of convolutional neural networks (CNNs), referred to as BranchOut, for online ensemble tracking. Our algorithm employs a CNN for target representation, which has a common convolutional layers but has multiple branches of fully connected layers. For better regularization...More
Code:
Data:
Introduction
- Visual tracking is valuable source of low-level information for high-level video understanding, so it has been applied to many computer vision tasks such as action recognition [6, 35], event detection [24], object detection from video [21], and so on.
- It is extremely difficult to learn representative but adaptive features for robust tracking, especially in online scenarios.
- The authors propose a novel visual tracking algorithm focusing on target appearance modeling, where the appearance is learned by a convolutional neural network (CNN) with multiple branches as shown in Figure 1.
- The target state is estimated by an ensemble of all branches while online model update is performed by the standard error backpropagation.
- The authors allow the individual branches to have different numbers of fully connected layers and maintain multilevel target representations
Highlights
- Visual tracking is valuable source of low-level information for high-level video understanding, so it has been applied to many computer vision tasks such as action recognition [6, 35], event detection [24], object detection from video [21], and so on
- We propose a novel visual tracking algorithm focusing on target appearance modeling, where the appearance is learned by a convolutional neural network (CNN) with multiple branches as shown in Figure 1
- We propose a simple but effective regularization technique, BranchOut, which is well-suited for online ensemble tracking
- Two evaluation metrics are employed in our experiment: bounding box overlap ratio and center location error in the one-pass evaluation (OPE) protocol
- All methods except MUSTer, DSST and Spatially Regularized Discriminative Correlation Filters (SRDCF) are based on the features from convolutional neural networks
- We presented a novel visual tracking algorithm based on stochastic ensemble learning based on a CNN with multiple branches
Methods
- The authors show the performance of the BranchOut technique with ensemble tracking application on two standard public benchmarks—Object Tracking Benchmark (OTB100) [39] and VOT2015 [22], and compare the algorithm with the state-of-the-art trackers.
Results
- Evaluation on OTB
OTB100 [39] is a popular benchmark dataset, which contains 100 fully annotated videos with substantial variations and challenges. - The authors construct two subsets of OTB100 based on average accuracy of the 10 compared algorithms; the two subsets are composed of the sequences that have lower average bounding box overlap ratios than two predefined thresholds, 0.7 and 0.5.
- These two subsets include 69 and 21 sequences, and can be regarded as hard and very hard examples.
- TCNN and BranchOut demonstrate outstanding scores and ranks while the performance C-COT in VOT2015 dataset is surprisingly low; it is probably because the algorithm is overfitted to other datasets
Conclusion
- The authors claim that BranchOut provides diverse models and effective regularization. Suppose that the model at time t1, denoted by Ft1 (xi; θk), evolves to Ft2 (xi; θk) at time t2.
- After |t2 −t1| deterministic model updates with the same training datasets, all the branches with the same architecture are likely to converge to the almost same model since they are updated with the same data for a substantial amount of iterations.
- The authors' ensemble tracking algorithm selects a random subset of branches for model update to diversify learned target appearance models.
- This technique, referred to as BranchOut, is effective to regularize ensemble classifiers and improve tracking accuracy .
- The proposed algorithm showed outstanding performance in the standard tracking benchmarks
Tables
- Table1: Internal comparison results in OTB100 dataset. Among three ensemble learning options, our stochastic BranchOut technique outperforms Naıve ensemble and Greedy BranchOut methods. On the other hand, multi-level representations for BranchOut, denoted by Multi-5-5, is helpful to improve performance compared to single-level representations such as Single-0-10 and Single-10-0. Fonts in bold faces denote the best performance within the group of options
- Table2: The accuracy of MDNet ensemble with BranchOut. Our ensemble tracking algorithm outperforms the original MDNet and its naıve ensemble results
- Table3: The experimental results in VOT2015 dataset. The first and second best algorithms are highlighted in red and blue colors, respectively. The algorithms are sorted in a decreasing order based on expected overlap ratio
Related work
- Visual tracking has a long history and there are tremendously many papers published in the last few decades. However, due to space limitation, we will review several active classes of methodology only in this section.
Tracking algorithms based on correlation filters are popular these days. This trend is mainly attributed to their great performance in terms of accuracy and efficiency. Bolme et al [3] have introduced a minimum output sum of squared error (MOSSE) filter for visual tracking. Kernelized correlation filters (KCF) using circulant matrices [15] are employed to handle multi-channel features in Fourier domain. DSST [7] decouples the filters for translation and scaling to achieve accurate scale estimation, and MUSTer [17], motivated by a psychological memory model, utilizes shortand long-term memory stores for robust appearance modeling. Tracking algorithms relying on correlation filters often suffer from boundary effects. To alleviate this issue, [11] proposes the Alternating Direction Method of Multipliers (ADMM) technique, and Spatially Regularized Discriminative Correlation Filters (SRDCF) [9] introduces a spatial regularization term.
Funding
- This work was BranchOut TCNN [29] DeepSRDCF [8] EBT [40] C-COT [10] SiamFC-3s [2] SRDCF [9] LDP [22] sPST [18] NSAMF [27] Struck [14] RAJSSC [22] Accuracy Robustness Expected Overlap 0.3384 0.3404 0.3181 0.3130 0.3034 0.2915 0.2877 0.2785 0.2767 0.2536 0.2458 0.2420 partly supported by the ICT R&D program of MSIP/IITP [2014-0-00147, Machine Learning Center; 2014-0-00059, DeepView; 2016-0-00563, Research on Adaptive Machine Learning Technology Development for Intelligent Autonomous Digital Companion]
Study subjects and analysis
samples: 256
21: until end of sequence. When searching for target in each frame, we draw N = 256 samples for observation. We enlarge search space wildly if classification scores from CNN are below the predefined threshold for more than 10 frames in a row
Reference
- B. Babenko, M.-H. Yang, and S. Belongie. Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1619–1632, 2012
- L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr. Fully-convolutional siamese networks for object tracking. In arXiv:1606.09549, 2016. 8
- D. S. Bolme, J. R. Beveridge, B. Draper, and Y. M. Lui. Visual object tracking using adaptive correlation filters. In CVPR, 2010. 2
- L. Breiman. Random forests. Machine Learning, 45(1):5– 32, 2001. 2, 3
- K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2014. 4
- W. Choi and S. Savarese. A unified framework for multitarget tracking and collective activity recognition. In ECCV, 2012. 1
- M. Danelljan, G. Hager, F. Khan, and M. Felsberg. Accurate scale estimation for robust visual tracking. In BMVC, 2014. 2, 5
- M. Danelljan, G. Hager, F. Khan, and M. Felsberg. Convolutional features for correlation filter based visual tracking. In ICCVW, 2015. 2, 5, 8
- M. Danelljan, G. Hager, F. Khan, and M. Felsberg. Learning spatially regularized correlation filters for visual tracking. In ICCV, 2015. 2, 5, 7, 8
- M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In ECCV, 2016. 2, 5, 7, 8
- H. K. Galoogahi, T. Sim, and S. Lucey. Correlation filters with limited boundaries. In CVPR, 2015. 2
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014. 4
- H. Grabner, M. Grabner, and H. Bischof. Real-time tracking via on-line boosting. In BMVC, 2006. 2
- S. Hare, A. Saffari, and P. H. Torr. Struck: Structured output tracking with kernels. In ICCV, 2011. 2, 7, 8
- J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Highspeed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583–596, 202
- S. Hong, T. You, S. Kwak, and B. Han. Online tracking by learning discriminative saliency map with convolutional neural network. In ICML, 2015. 2, 5
- Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. MUlti-Store Tracker (MUSTer): a cognitive psychology inspired approach to object tracking. In CVPR, 2015. 2, 5
- Y. Hua, K. Alahari, and C. Schmid. Online object tracking with proposal selection. In ICCV, 2015. 7, 8
- G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Weinberger. Deep networks with stochastic depth. In ECCV, 2016. 3
- Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learningdetection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1409–1422, 2012. 2
- K. Kang, W. Ouyang, H. Li, and X. Wang. Object detection from video tubelets with convolutional neural networks. In CVPR, 2016. 1
- M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin, G. Fernandez, T. Vojir, G. Hager, G. Nebehay, R. Pflugfelder, et al. The visual object tracking VOT2015 challenge results. In ICCVW, pages 564–586, 2015. 5, 7, 8
- M. Kristan, J. Matas, A. Leonardis, T. Vojir, R. Pflugfelder, G. Fernandez, G. Nebehay, F. Porikli, and L. Cehovin. A novel performance evaluation methodology for single-target trackers. In IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 38, pages 2137–2155, 2016.
- S. Kwak, B. Han, and J. H. Han. Multi-agent event detection: Localization and role assignment. In CVPR, 2013. 1
- S. Lee, S. Purushwalkam, M. Cogswell, V. Ranjan, D. Crandall, and D. Batra. Stochastic multiple choice learning for training diverse deep ensembles. In NIPS, 2016. 3, 6
- H. Li, Y. Li, and F. Porikli. Convolutional neural net bagging for online visual tracking. Computer Vision and Image Understanding, 153:120–129, 2016. 2
- Y. Li and J. Zhu. A scale adaptive kernel correlation filter tracker with feature integration. In ECCVW, 2014. 7, 8
- C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang. Hierarchical convolutional features for visual tracking. In ICCV, 2015. 2, 5
- H. Nam, M. Baek, and B. Han. Modeling and propagating cnns in a tree structure for visual tracking. In arXiv:1608.07242, 2016. 2, 5, 8
- H. Nam and B. Han. Learning multi-domain convolutional neural networks for visual tracking. In CVPR, 2016. 2, 4, 5, 7
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015. 2, 4
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958, 2014. 1, 3
- J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. Efficient object localization using convolutional networks. In CVPR, 2015. 3
- L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. Regularization of neural network using dropconnect. In ICML, 2013. 1, 3
- H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, 2011. 1
- L. Wang, W. Ouyang, X. Wang, and H. Lu. Visual tracking with fully convolutional networks. In ICCV, 2015. 2, 5
- L. Wang, W. Ouyang, X. Wang, and H. Lu. STCT: sequentially training convolutional networks for visual tracking. In CVPR, 2016. 2, 3
- N. Wang and D.-Y. Yeung. Ensemble-based tracking: Aggregating crowdsourced structured time series data. In ICML, 2014. 7
- Y. Wu, J. Lim, and M. Yang. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, 2015. 5
- G. Zhu, F. Porikli, and H. Li. Tracking randomly moving objects on edge box proposals. arXiv:1507.08085, 2015. 7, 8
Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn