Improved Dnn-Based Segmentation For Multi-Genre Broadcast Audio

ICASSP(2016)

引用 21|浏览84
暂无评分
摘要
Automatic segmentation is a crucial initial processing step for processing multi-genre broadcast (MGB) audio. It is very challenging since the data exhibits a wide range of both speech types and background conditions with many types of non-speech audio. This paper describes a segmentation system for multi-genre broadcast audio with deep neural network (DNN) based speech/non-speech detection. A further stage of change-point detection and clustering is used to obtain homogeneous segments. Suitable DNN inputs, context window sizes and architectures are studied with a series of experiments using a large corpus of MGB television audio. For MGB transcription, the improved segmenter yields roughly half the increase in word error rate, over manual segmentation, compared to the baseline DNN segmenter supplied for the 2015 ASRU MGB challenge.
更多
查看译文
关键词
audio segmentation,deep neural network,multi-genre broadcast data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要