Indexing Multimedia Documents With Acoustic Concept Recognition Lattices

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5(2013)

引用 27|浏览11
暂无评分
摘要
The amount of multimedia data is increasing every day and there is a growing demand for high-accuracy multimedia retrieval systems that go beyond retrieving simple events (e.g., detecting a sport video), to more specific and hard-to-detect events (e.g., a point in a tennis match). To retrieve these complex events, audio content features play an important role since they provide complementary information to image/video features. In this paper, we propose a novel approach where we employ an HMM-based acoustic concept recognition (ACR) system and convert resulting recognition lattices into acoustic concept indexes to represent multimedia audio content. Lattice indexes are created by extracting posterior-weighted N-gram counts from the ACR lattices and they are used as features in SVM-based classification for multimedia event detection (MED) task. We evaluate the proposed approach on the NIST 2011 TRECVID MED development set, which consists of user-generated videos from the internet. Proposed approach yields an Equal Error Rate (EER) of 31.6% on this acoustically challenging dataset (on a set of 5 video events) outperforming previously proposed supervised and unsupervised approaches on the same dataset (34.5% and 36.9% respectively).
更多
查看译文
关键词
Multimedia event detection (MED),acoustic concept recognition,lattice N-gram counts,acoustic concept indexes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要