Residual plus Capsule Networks (ResCap) for Simultaneous Single-Channel Overlapped Keyword Recognition

INTERSPEECH(2019)

引用 6|浏览12
暂无评分
摘要
Overlapped speech poses a significant problem in a variety of applications in speech processing including speaker identification, speaker diarization, and speech recognition among others. To address it, existing systems combine source separation with algorithms for processing non-overlapped speech (e.g. source separation + follow-on speech recognition). In this paper we propose a modified network architecture to simultaneously recognize keywords from overlapped speech without explicitly having to perform source separation. We build our network by adding capsule layers to a ResNet architecture that has shown state-of-the-art performance on a traditional keyword recognition task. We evaluate the model on a series of 10-word overlapped keyword recognition experiments, using speaker dependent and speaker independent training. Results indicate that Residual + Capsule (ResCap) network shows marked improvement in recognizing overlapped speech, especially in experiments where there is a mismatch in the number of overlapped speakers between the training set and the test set.
更多
查看译文
关键词
speech recognition, keyword spotting, recognition, overlapped speech, capsule networks, residual networks, ResNet
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要