The development of the cambridge university alignment systems for the multi-genre broadcast challenge

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)(2015)

引用 18|浏览49
暂无评分
摘要
We describe the alignment systems developed both for the preparation of data for the Multi-Genre Broadcast (MGB) challenge and for our participation in the transcription and alignment tasks. Captions of varying quality are aligned with the audio of TV shows that range from few minutes long to more than six hours. Lightly supervised decoding is performed on the audio and the output text is aligned with the original text transcript. Reliable split points are found and the resulting text chunks are force-aligned with the corresponding audio segments. Confidence scores are associated with the aligned data. Multiple refinements — including audio segmentation based on deep neural networks (DNNs) and the use of DNN-based acoustic models — were used to improve the performance. The final MGB alignment system had the highest F-measure value on the evaluation data.
更多
查看译文
关键词
Alignment,Lightly Supervised Training,Multi-genre Broadcast transcription
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要