An Image-based Deep Spectrum Feature Representation for the Recognition of Emotional Speech.

Nicholas Cummins,Shahin Amiriparian,Gerhard Hagerer,Anton Batliner,Stefan Steidl,Björn W. Schuller

MM '17: ACM Multimedia Conference Mountain View California USA October, 2017（2017）

引用 170|浏览149

暂无评分

摘要

The outputs of the higher layers of deep pre-trained convolutional neural networks (CNNs) have consistently been shown to provide a rich representation of an image for use in recognition tasks. This study explores the suitability of such an approach for speech-based emotion recognition tasks. First, we detail a new acoustic feature representation, denoted as deep spectrum features, derived from feeding spectrograms through a very deep image classification CNN and forming a feature vector from the activations of the last fully connected layer. We then compare the performance of our novel features with standardised brute-force and bag-of-audio-words (BoAW) acoustic feature representations for 2- and 5-class speech-based emotion recognition in clean, noisy and denoised conditions. The presented results show that image-based approaches are a promising avenue of research for speech-based recognition tasks. Key results indicate that deep-spectrum features are comparable in performance with the other tested acoustic feature representations in matched for noise type train-test conditions; however, the BoAW paradigm is better suited to cross-noise-type train-test conditions.

查看译文

关键词

convolutional neural networks, image recognition, spectral features, computational paralinguistics, emotions, realism

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要