Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages
CoRR(2024)
摘要
Emotional Voice Messages (EMOVOME) is a spontaneous speech dataset containing
999 audio messages from real conversations on a messaging app from 100 Spanish
speakers, gender balanced. Voice messages were produced in-the-wild conditions
before participants were recruited, avoiding any conscious bias due to
laboratory environment. Audios were labeled in valence and arousal dimensions
by three non-experts and two experts, which were then combined to obtain a
final label per dimension. The experts also provided an extra label
corresponding to seven emotion categories. To set a baseline for future
investigations using EMOVOME, we implemented emotion recognition models using
both speech and audio transcriptions. For speech, we used the standard eGeMAPS
feature set and support vector machines, obtaining 49.27
accuracy for valence and arousal respectively. For text, we fine-tuned a
multilingual BERT model and achieved 61.15
valence and arousal respectively. This database will significantly contribute
to research on emotion recognition in the wild, while also providing a unique
natural and freely accessible resource for Spanish.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要