Emotions Understanding Model from Spoken Language using Deep Neural Networks and Mel-Frequency Cepstral Coefficients

Marco Giuseppe de Pinto,Marco Polignano,Pasquale Lops,Giovanni Semeraro

2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS)（2020）

引用 24|浏览54

暂无评分

摘要

The ability to understand people through spoken language is a skill that many human beings take for granted. On the contrary, the same task is not as easy for machines, as consequences of a large number of variables which vary the speaking sound wave while people are talking to each other. A sub-task of speeches understanding is about the detection of the emotions elicited by the speaker while talking, and this is the main focus of our contribution. In particular, we are presenting a classification model of emotions elicited by speeches based on deep neural networks (CNNs). For the purpose, we focused on the audio recordings available in the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset. The model has been trained to classify eight different emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprise) which correspond to the ones proposed by Ekman plus the neutral and calm ones. We considered as evaluation metric the F1 score, obtaining a weighted average of 0.91 on the test set and the best performances on the "Angry" class with a score of 0.95. Our worst results have been observed for the sad class with a score of 0.87 that is nevertheless better than the state-of-the-art. In order to support future development and the replicability of results, the source code of the proposed model is available on the following GitHub repository: https://github.com/marcogdepinto/Emotion-Classification-Ravdess.

查看译文

关键词

emotion detection,natural language understanding,sentiment analysis,deep learning,machine learning,classification,mel-frequency cepstral coefficients,cnn,ravdess

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要