Using prosodic and lexical information for speaker identification.

ICASSP(2002)

引用 92|浏览41
暂无评分
摘要
We investigate the incorporation of larger time-scale information, such as prosody, into standard speaker ID systems. Our study is based on the Extended Data Task of the NIST 2001 Speaker ID evaluation, which provides much more test and training data than has traditionally been available to similar speaker ID investiga- tions. In addition, we have had access to a detailed prosodic fea- ture database of Switchboard-I conversations, including data not previously applied to speaker ID. We describe two baseline acous- tic systems, an approach using Gaussian Mixture Models, and an LVCSR-based speaker ID system. These results are compared to and combined with two larger time-scale systems: a system based on an "idiolect" language model, and a system making use of the contents of the prosody database. We find that, with sufficient test and training data, suprasegmental information can significantly en- hance the performance of traditional speaker ID systems.
更多
查看译文
关键词
accuracy,gaussian mixture model,robustness,language model,artificial intelligence,switches,nist,data models,computational modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要