Segment-Based Speech Emotion Recognition Using Recurrent Neural Networks
2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII)(2017)
摘要
Recently, Recurrent Neural Networks (RNNs) have produced state-of-the-art results for Speech Emotion Recognition (SER). The choice of the appropriate time-scale for Low Level Descriptors (LLDs) (local features) and statistical functionals (global features) is key for a high performing SER system. In this paper, we investigate both local and global features and evaluate the performance at various time-scales (frame, phoneme, word or utterance). We show that for RNN models, extracting statistical functionals over speech segments that roughly correspond to the duration of a couple of words produces optimal accuracy. We report state-of-the-art SER performance on the IEMOCAP corpus at a significantly lower model and computational complexity.
更多查看译文
关键词
Segment-based speech emotion recognition,high performing SER system,Low Level Descriptors,recurrent neural networks,speech segments,statistical functionals,global features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络