Softsad: Integrated Frame-Based Speech Confidence For Speaker Recognition

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2015)

引用 22|浏览97
暂无评分
摘要
In this paper we propose softSAD: the direct integration of speech posteriors into a speaker recognition system as an alternative to using speech activity detection (SAD). Motivated by the need to use audio from short recordings more efficiently, softSAD removes the need to discard audio using speech/non-speech decisions based on a threshold as done with SAD. Instead, softSAD explicitly integrates into the Baum-Welch statistics a speech posterior for each frame. We compare softSAD and SAD in mismatched conditions by evaluating a system developed for the National Institute for Standards and Technology (NIST) 2012 speaker recognition evaluation (SRE) on the short test conditions of the channel-degraded Robust Automatic Transcription of Speech (RATS) speaker identification task (and vice versa). We demonstrate that softSAD provides benefit over SAD for short test audio in mismatched conditions.
更多
查看译文
关键词
Speech activity detection,speaker identification,unseen conditions,mismatched conditions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要