Content Normalization For Text-Dependent Speaker Verification

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION(2017)

引用 8|浏览52
暂无评分
摘要
Subspace based techniques, such as i-vector and Joint Factor Analysis (JFA) have shown to provide state-of-the-art performance for fixed phrase based text-dependent speaker verification. However, the error rates of such systems on the random digit task of RSR dataset are higher than that of Gaussian Mixture Model-Universal Background Model (GMM-UBM). In this paper, we aim at improving i-vector system by normalizing the content of the enrollment data to match the test data. We estimate i-vectors for each frames of a speech utterance (also called online i-vectors). The largest similarity scores across frames between enrollment and test are taken using these online i-vectors to obtain speaker verification scores. Experiments on Part3 of RSR corpora show that the proposed approach achieves 12% relative improvement in equal error rate over a GMM-UBM based baseline system.
更多
查看译文
关键词
speaker verification, i-vectors, content matching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要