NIST RT'05S evaluation: pre-processing techniques and speaker diarization on multiple microphone meetings

MACHINE LEARNING FOR MULTIMODAL INTERACTION(2005)

引用 50|浏览0
暂无评分
摘要
This paper presents different pre-processing techniques, coupled with three speaker diarization systems in the framework of the NIST 2005 Spring Rich Transcription campaign (RT'05S). The pre-processing techniques aim at providing a signal quality index in order to build a unique “virtual” signal obtained from all the microphone recordings available for a meeting. This unique virtual signal relies on a weighted sum of the different microphone signals while the signal quality index is given according to a signal to noise ratio. Two methods are used in this paper to compute the instantaneous signal to noise ratio: a speech activity detection based approach and a noise spectrum estimate. The speaker diarization task is performed using systems developed by different labs: the LIA, LIUM and CLIPS. Among the different system submissions made by these three labs, the best system obtained 24.5 % speaker diarization error for the conference subdomain and 18.4 % for the lecture subdomain.
更多
查看译文
关键词
multiple microphone meeting,different microphone signal,different system,signal quality index,different pre-processing technique,different lab,noise spectrum estimate,unique virtual signal,instantaneous signal,noise ratio,nist rt,speaker diarization error,speaker diarization,spectrum,signal to noise ratio,speech activity detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要