Breaking Barriers: Can Multilingual Foundation Models Bridge the Gap in Cross-Language Speech Emotion Recognition?

Moazzam Shoukat,Muhammad Usama,Hafiz Shehbaz Ali,Siddique Latif

2023 Tenth International Conference on Social Networks Analysis, Management and Security (SNAMS)（2023）

引用 0|浏览1

暂无评分

摘要

Speech emotion recognition (SER) faces challenges in cross-language scenarios due to differences in linguistic and cultural expression of emotions across languages. Recently, large multilingual foundation models pre-trained on massive corpora have achieved performance on natural language understanding tasks by learning cross-lingual representations. Their ability to understand relationships between languages without direct translation opens up possibilities for more applicable multilingual models. In this paper, we evaluate the capabilities of foundation models (Wav2Vec2, XLSR, Whisper and MMS) to bridge the gap in cross-language SER. Specifically, we analyse their performance on benchmark cross-language SER datasets involving four languages for emotion classification. Our experiments show that the foundation model outperforms CNN-LSTM baselines, establishing their superiority in cross-lingual transfer learning for emotion recognition. However, self-supervised pre-training plays a key role, and inductive biases alone are insufficient for high cross-lingual generalisability. Foundation models also demonstrate gains over baselines with limited target data and better performance on noisy data. Our findings indicate that while foundation models hold promise, pre-training remains vital for handling linguistic variations across languages for SER.

查看译文

关键词

cross-language,speech emotion recognition,foundation models,transformers,multilingual data,self-supervised learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要