AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We proposed a correlative multi-labeling approach to exploit how the concept correlations help infer the video semantic concepts

Correlative multi-label video annotation

ACM Multimedia 2001, pp.17-26, (2007)

被引用544|浏览124
WOS SCOPUS EI
下载 PDF 全文
引用
微博一下

摘要

Automatically annotating concepts for video is a key to semantic-level video browsing, search and navigation. The research on this topic evolved through two paradigms. The first paradigm used binary classification to detect each individual concept in a concept set. It achieved only limited success, as it did not model the inherent correla...更多

代码

数据

0
简介
  • Annotating video at the semantic concept level has emerged as an important topic in the multimedia research community [11][16].
  • The annotation problem of interest to this paper, as well as to other research efforts [16][12], is a multi-labeling process where a video clip can be annotated with multiple labels.
  • Multi-class annotation process labels only one concept to each video clip.
重点内容
  • Annotating video at the semantic concept level has emerged as an important topic in the multimedia research community [11][16]
  • The annotation problem of interest to this paper, as well as to other research efforts [16][12], is a multi-labeling process where a video clip can be annotated with multiple labels
  • The kernel function in Eqn (24) is used, and this approach is denoted by Correlative Multi-Label (CML)(II)
  • We proposed a correlative multi-labeling (CML) approach to exploit how the concept correlations help infer the video semantic concepts
  • Experiments on the widely used benchmark TRECVID data set demonstrated that CML is superior to state-of-the-art approaches in the first and second paradigms, in both overall performance and the consistency of performance on diverse concepts
  • We will apply the proposed algorithm to other applications, such image annotation, text categorization in which there exists a large number of correlative concepts
方法
  • To evaluate the proposed video annotation algorithm, the authors conduct the experiments on the benchmark TRECVID 2005 data set [17]
  • This is one of the most widely used data sets by many groups in the area of multimedia concept modeling[2][3][7].
  • 39 concepts are multi-labeled according to LSCOM-Lite annotations [12]
  • These annotated concepts consist of a wide range of genres, including program category, setting/scene/site, people, object, activity, event, and graphics.
  • These correlations are proven statistically significant by the normalized mutual information (See Figure 3)
结果
  • The authors report experiment results on TRECVID data set. Two different modeling strategies are adopted in the experiments.
  • All concept pairs are taken into consideration in the model and the kernel function in Eqn (5) is adopted.
  • The authors denote this method by CML(I) in the experiment.
  • The authors adopt the strategy described in Section 4.1 and a subset of the concept pairs is applied based on their interacting significance.
  • The kernel function in Eqn (24) is used, and this approach is denoted by CML(II)
结论
  • The authors proposed a correlative multi-labeling (CML) approach to exploit how the concept correlations help infer the video semantic concepts.
  • Experiments on the widely used benchmark TRECVID data set demonstrated that CML is superior to state-of-the-art approaches in the first and second paradigms, in both overall performance and the consistency of performance on diverse concepts.
  • The authors will study how the performance changes with the increment of video concept number, and if the algorithm can get more improvement gain by exploiting a large number of concepts.
  • The authors will apply the proposed algorithm to other applications, such image annotation, text categorization in which there exists a large number of correlative concepts
基金
  • Some of the improvements are significant, such as “office” (477% better than InidSVM and 260% better than CBCF), “people-marching” (68% better than IndSVM and 160% better than CBCF), “walking running” (55% better than IndSVM and 48% better than CBCF)
引用论文
  • S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
    Google ScholarFindings
  • M. Campbell and et al. Ibm research trecvid-2006 video retrieval system. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
    Google ScholarLocate open access versionFindings
  • S.-F. Chang and et al. Columbia university trecvid-2006 video search and high-level feature extraction. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
    Google ScholarLocate open access versionFindings
  • N. Cristianini and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University, 2000.
    Google ScholarLocate open access versionFindings
  • S. Godbole and S. Sarawagi. Discriminative methods for multi-labeled classification. In PAKDD, 2004.
    Google ScholarLocate open access versionFindings
  • A. Hauptmann, M.-Y. Chen, and M. Christel. Confounded expectations: Informedia at TRECVID 2004. In TREC Video Retrieval Evaluation Online Proceedings, 2004.
    Google ScholarLocate open access versionFindings
  • A. G. Hauptmann and et al. Multi-lingual broadcast news retrieval. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
    Google ScholarLocate open access versionFindings
  • W. Jiang, S.-F. Chang, and A. Loui. Active concept-based concept fusion with partial user labels. In Proceedings of IEEE International Conference on Image Processing, 2006.
    Google ScholarLocate open access versionFindings
  • [10] M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE Trans. on CSVT, 12(1), Jan. 2002.
    Google ScholarLocate open access versionFindings
  • [12] M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. In IBM Research Report RC23612 (W0505-104), 2005.
    Google ScholarLocate open access versionFindings
  • [13] K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61–67, 1999.
    Google ScholarLocate open access versionFindings
  • [14] X. Shen, M. Boutell, J. Luo, and C. Brown. Multi-label machine learning and its application to semantic scene classification. In International Symposium on Electronic Imaging, 2004.
    Google ScholarLocate open access versionFindings
  • [15] J. R. Smith and M. Naphade. Multimedia semantic indexing using model vectors. In Proceeding of IEEE International Conferences on Multimedia and Expo, 2003.
    Google ScholarLocate open access versionFindings
  • [16] C. Snoek and et al. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the ACM International Conference on Multimedia, pages 421–430, Santa Barbara, USA, October 2006.
    Google ScholarLocate open access versionFindings
  • [18] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for intedependent and structured output spaces. In Proc. of Internatial Conference on ICML, 2004.
    Google ScholarLocate open access versionFindings
  • [20] Y. Wu, B. L. Tseng, and J. R. Smith. Ontology-based multi-classification learning for video concept detection. In Proceeding of IEEE International Conferences on Multimedia and Expo, 2004.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科