A Novel Lstm-Based Speech Preprocessor For Speaker Diarization In Realistic Mismatch Conditions
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2018)
摘要
In this study, we investigate on the effects of deep learning based speech enhancement as a preprocessor to speaker diarization in quite challenging realistic environments involving the background noises, reverberations and overlapping speech. To improve the generalization capability, the advanced long short-term memory (LSTM) architecture with the novel design of hidden layers via densely connected progressive learning and output layer via multiple-target learning is proposed for preprocessing. We build the deep model using synthesized training data pairs generated from WSJ0 reading-style speech and more than 100 noise types. Surprisingly, this proposed preprocessor demonstrates a strong generalization capability to speaker diarization with the realistic noisy speech in highly mismatched conditions, in terms of the speaking style, interferences, and the interaction between them. Tested on three challenging tasks, namely AMI, ADOS, and SeedLings, the state-of-the-art diarization system with the novel LSTM-based speech preprocessor can yield consistent and significant reductions of diarization error rate (DER) over the systems using unprocessed noisy speech and traditional enhancement methods.
更多查看译文
关键词
Speaker diarization, deep learning based speech enhancement, densely connected progressive learning, multiple-target learning, highly mismatch condition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络