Monologue versus Conversation: Differences in Emotion Perception and Acoustic Expressivity

2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII)(2022)

引用 0|浏览9
暂无评分
摘要
Advancing speech emotion recognition (SER) depends highly on the source used to train the model, i.e., the emotional speech corpora. By permuting different design parameters, researchers have released versions of corpora that attempt to provide a better-quality source for training SER. In this work, we focus on studying communication modes of collection. In particular, we analyze the patterns of emotional speech collected during interpersonal conversations or monologues. While it is well known that conversation provides a better protocol for eliciting authentic emotion expressions, there is a lack of systematic analyses to determine whether conversational speech provide a “better-quality” source. Specifically, we examine this research question from three perspectives: perceptual differences, acoustic variability and SER model learning. Our analyses on the MSP-Podcast corpus show that: 1) rater's consistency for conversation recordings is higher when evaluating categorical emotions, 2) the perceptions and acoustic patterns observed on conversations have properties that are better aligned with expected trends discussed in emotion literature, and 3) a more robust SER model can be trained from conversational data. This work brings initial evidences stating that samples of conversations may provide a better-quality source than samples from monologues for building a SER model.
更多
查看译文
关键词
speech emotion recognition,emotion perception,acoustic expression,conversation,monologue
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要