Modeling duration patterns for speaker recognition

INTERSPEECH(2003)

引用 67|浏览59
暂无评分
摘要
We present a method for speaker recognition that uses the duration patterns of speech units to aid speaker clas- sification. The approach represents each word and/or phone by a feature vector comprised of either the dura- tions of the individual phones making up the word, or the HMM states making up the phone. We model the vectors using mixtures of Gaussians. The speaker spe- cific models are obtained through adaptation of a "back- ground" model that is trained on a large pool of speak- ers. Speaker models are then used to score the test data; they are normalized by subtracting the scores obtained with the background model. We find that this approach yields significant perfomance improvement when com- bined with a state-of-the-art speaker recognition system based on standard cepstral features. Furthermore, the im- provement persists even after combination with lexical features. Finally, the improvement continues to increase with longer test sample durations, beyond the test dura- tion at which standard system accuracy level off.
更多
查看译文
关键词
mixture of gaussians,speaker recognition,feature vector
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要