Speech Emotion Recognition With Acoustic And Lexical Features

Qin Jin,Chengxin Li,Shizhe Chen,Huimin Wu

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2015）

引用 198|浏览103

暂无评分

摘要

In this paper we explore one of the key aspects in building an emotion recognition system: generating suitable feature representations. We generate feature representations from both acoustic and lexical levels. At the acoustic level, we first extract low-level features such as intensity, F0, jitter, shimmer and spectral contours etc. We then generate different acoustic feature representations based on these low-level features, including statistics over these features, a new representation derived from a set of low-level acoustic codewords, and a new representation from Gaussian Supervectors. At the lexical level, we propose a new feature representation named emotion vector (eVector). We also use the traditional Bag-of-Words (BoW) feature. We apply these feature representations for emotion recognition and compare their performance on the USC-IEMOCAP database. We also combine these different feature representations via early fusion and late fusion. Our experimental results show that late fusion of both acoustic and lexical features achieves four-class emotion recognition accuracy of 69.2%.

查看译文

关键词

Emotion recognition,Acoustic features,Emotion lexicon,Lexical features,Support vector machine

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要