Speech emotion recognition using amplitude modulation parameters

semanticscholar(2014)

引用 0|浏览0
暂无评分
摘要
In the community of Human Computer Interface (HCI) researchers have been working for several years in trying to emulate a human communication system, using innovative technologies and methodologies, based on the emotion recognition in facial expressions and speech [1-3]. Speech emotion recognition (SER) [4] is a challenging framework in demanding human machine interaction systems. Standard approaches based on the categorical model of emotions reach low performance, probably due to the modelization of emotions as distinct and independent affective states. Starting from the recently investigated assumption on the dimensional circumplex model of emotions [5,6], SER systems are structured as the prediction of valence and arousal on a continuous scale in a two-dimensional domain. In this study, we propose the use of a PLS regression model, optimized according to specific features selection procedures and trained on the Italian speech corpus EMOVO, suggesting a way to automatically label the corpus in terms of arousal and valence. In order to label the dataset according to valence and arousal, the circumplex diagram was divided into angular sectors, each having a specific angular range and a central angle (ak) as shown in Fig. 1. Each emotion was also identified by the coordinates (yve, yae), e = 1, . . . ,Ne, with Ne the number of emotions.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要