A Paralinguistic Approach To Speaker Diarisation: Using Age, Gender, Voice Likability and Personality Traits.

MM '17: ACM Multimedia Conference Mountain View California USA October, 2017(2017)

引用 12|浏览208
暂无评分
摘要
In this work, we present a new view on automatic speaker diarisation, i.e., assessing "who speaks when", based on the recognition of speaker traits such as age, gender, voice likability, and personality. Traditionally, speaker diarisation is accomplished using low-level audio descriptors (e.g., cepstral or spectral features), neglecting the fact that speakers can be well discriminated by humans according to various perceived characteristics. Thus, we advocate a novel paralinguistic approach that combines speaker diarisation with speaker characterisation by automatically identifying the speakers according to their individual traits. In a three-tier processing flow, speaker segmentation by voice activity detection (VAD) is initially performed to detect speaker turns. Next, speaker attributes are predicted using pre-trained paralinguistic models. To tag the speakers, clustering algorithms are applied to the predicted traits. We evaluate our methods against state-of-the-art open source and commercial systems on a corpus of realistic, spontaneous dyadic conversations recorded in the wild from three different cultures (Chinese, English, German). Our results provide clear evidence that using paralinguistic features for speaker diarisation is a promising avenue of research.
更多
查看译文
关键词
Speaker diarisation, speaker identification, speaker characteristics, computational paralinguistics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要