Integrating Online I-Vector Extractor With Information Bottleneck Based Speaker Diarization System

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 34|浏览21
暂无评分
摘要
Conventional approaches to speaker diarization use short-term features such as Mel Frequency Cepstral Co-efficients (MFCC). Features such as i-vectors have been used on longer segments (minimum 2.5 seconds of speech). Using i-vectors for speaker diarization has been shown to be beneficial as it models speaker information explicitly. In this paper, the i-vector modelling technique is adapted to be used as short term features for diarization by estimating i-vectors over a short window of MFCCs. The Information Bottleneck (IB) approach provides a convenient platform to integrate multiple features together for fast and accurate diarization of speech. Speaker models are estimated over a window of 10 frames of speech and used as features in the IB system. Experiments on the NIST RT datasets show an absolute improvement of 3.9% in the best case when i-vectors are used as auxiliary features to MFCCs. Further, discriminative training algorithms such as LDA and PLDA are applied on the i-vectors. A best case performance improvement of 5% in absolute terms is obtained on the RT datasets.
更多
查看译文
关键词
speaker diarization, online i-vectors, Information Bottleneck
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要