Exploring the Intersection Between Speaker Verification and Emotion Recognition

2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)(2019)

引用 5|浏览50
暂无评分
摘要
Many scenarios in practical applications require the use of speaker verification systems using audio with high emotional content (e.g., calls from 911, forensic analysis of threatening recordings). For these cases, it is important to explore the intersection between speaker and emotion recognition tasks. A key challenge to address this problem is the lack of resources, since current emotional databases are commonly limited in size and number of speakers. This paper (1) creates the infrastructure to study this challenging problems, and (2) presents an exploratory analysis to evaluate the accuracy of state-of-the-art speaker and emotion recognition systems to automatically retrieve specific emotional behaviors from target speakers. We collected a pool of sentences from multiple speakers (132,930 segments), where some of these speaking turns belong to 146 speakers in the MSP-Podcast database. Our framework trains speaking verification models, which are used to retrieve candidate speaking turns from the pool of sentences. The emotional content in these sentences are detected using state-of-the-art emotion recognition algorithms. The experimental evaluation provides promising results, where most of the retrieved sentences belong to the target speakers and has the target emotion. The results highlight the need for emotional compensation in speaker recognition systems, especially if these models are intended for commercial applications.
更多
查看译文
关键词
Speech emotion recognition,speaker verification,computational paralinguistics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要