Filtering The Unknown: Speech Activity Detection In Heterogeneous Video Collections
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4(2007)
摘要
In this paper we discuss the speech activity detection system that we used for detecting speech regions in the Dutch TRECVID video collection. The system is designed to filter non-speech like music or sound effects out of the signal without the use of predefined non-speech models. Because the system trains its models on-line, it is robust for handling out-of-domain data. The speech activity error rate on an out-of-domain test set, recordings of English conference meetings, was 4.4%. The overall error rate on twelve randomly selected five minute TRECVID fragments was 11.5%.
更多查看译文
关键词
speech activity detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要