Speech/Nonspeech Segmentation In Web Videos

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3（2012）

引用 48|浏览22

暂无评分

摘要

Speech transcription of web videos requires first detecting segments with transcribable speech. We refer to this as segmentation. Commonly used segmentation techniques are inadequate for domains such as You Tube, where videos may have a large variety of background and recording conditions. In this work, we investigate alternative audio features and a discriminative classifier, which together yield a lower frame error rate (25.3%) on You Tube videos compared to the commonly used Gaussian mixture models trained on cepstral features (30.6%). The alternative audio features perform particularly well in noisy conditions.

查看译文

关键词

segmentation,speech detection,voice activity detection,video

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要