Improved Dnn-Based Segmentation For Multi-Genre Broadcast Audio

L. Wang,C. Zhang,P. C. Woodland,M. J. F. Gales, P. Karanasou,P. Lanchantin,X. Liu, Y. Qian

ICASSP（2016）

引用 21|浏览84

暂无评分

摘要

Automatic segmentation is a crucial initial processing step for processing multi-genre broadcast (MGB) audio. It is very challenging since the data exhibits a wide range of both speech types and background conditions with many types of non-speech audio. This paper describes a segmentation system for multi-genre broadcast audio with deep neural network (DNN) based speech/non-speech detection. A further stage of change-point detection and clustering is used to obtain homogeneous segments. Suitable DNN inputs, context window sizes and architectures are studied with a series of experiments using a large corpus of MGB television audio. For MGB transcription, the improved segmenter yields roughly half the increase in word error rate, over manual segmentation, compared to the baseline DNN segmenter supplied for the 2015 ASRU MGB challenge.

查看译文

关键词

audio segmentation,deep neural network,multi-genre broadcast data

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要