Exploiting Sequence Information For Text-Dependent Speaker Verification

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2017)

引用 28|浏览51
暂无评分
摘要
Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector and relevance Maximum-a-Posteriori (MAP), have shown to provide state-of-the-art performance for text-dependent systems with fixed phrases. The performance of i-vector and JFA models has been further enhanced by estimating posteriors from Deep Neural Network (DNN) instead of Gaussian Mixture Model (GMM). While both DNNs and GMMs aim at incorporating phonetic information of the phrase with these posteriors, model-based SV approaches ignore the sequence information of the phonetic units of the phrase. In this paper, we tackle this issue by applying dynamic time warping using speaker-informative features. We propose to use i-vectors computed from short segments of each speech utterance, also called online i-vectors, as feature vectors. The proposed approach is evaluated on the RedDots database and provides an improvement of 75% relative equal error rate over the best model-based SV baseline system in a content-mismatch condition.
更多
查看译文
关键词
Text-dependent speaker verification, DNN posteriors, Dynamic Time Warping
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要