Estimation Of The Number Of Speakers With Variational Bayesian Plda In The Dihard Diarization Challenge

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES(2018)

引用 28|浏览9
暂无评分
摘要
This paper focuses on the estimation of the number of speakers for diarization in the context of the DIHARD Challenge at InterSpeech 2018. This evaluation seeks the improvement of the diarization task in challenging corpora (Youtube videos, meetings, court audios, etc), containing an undetermined number of speakers with different relevance in terms of speech contributions. Our proposal for the challenge is a system based on the i-vector PLDA paradigm: Given some initial segmentation of the input audio we extract i-vector representations for each acoustic fragment. These i-vectors are clustered with a Fully Bayesian PLDA. This model, a generative model with latent variables as speaker labels, produces the diarization labels by means of Variational Bayes iterations. The number of speakers is decided by comparing multiple hypotheses according to different information criteria. These criteria are developed around the Evidence Lower Bound (ELBO) provided by our PLDA.
更多
查看译文
关键词
DIHARD Challenge, Diarization, i-vectors, PLDA, Variational Bayes, number of speakers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要