Robust Semisupervised Generative Adversarial Networks For Speech Emotion Recognition Via Distribution Smoothness

IEEE ACCESS（2020）

引用 14|浏览15

暂无评分

摘要

Despite the recent great achievements in speech emotion recognition (SER) with the development of deep learning, the performance of SER systems depends strongly on the amount of labeled data available for training. Obtaining sufficient annotated data, however, is often extremely time consuming and costly and sometimes even prohibitive because of privacy and ethical concerns. To address this issue, this article proposes the semisupervised generative adversarial network (SSGAN) for SER to capture underlying knowledge from both labeled and unlabeled data. The SSGAN is derived from a GAN, but the discriminator of the SSGAN can not only classify its input samples as real or fake but also distinguish their emotional class if they are real. Thus, the distribution of realistic inputs can be learned to encourage label information sharing between labeled and unlabeled data. This article proposes two advanced methods, i.e., the smoothed SSGAN (SSSGAN) and the virtual smoothed SSGAN (VSSSGAN), which, respectively, smooth the data distribution of the SSGAN via adversarial training (AT) and virtual adversarial training (VAT). The SSSGAN smooths the conditional label distribution given inputs using labeled examples, while the VSSSGAN smooths the conditional label distribution without label information (& x201C;virtual& x201D; labels). To evaluate the effectiveness of the proposed methods, four publicly available and frequently used corpora are selected to conduct experiments in intradomain and interdomain situations. The results illustrate that the proposed methods are superior to the state-of-the-art methods. Specifically, in experimental settings with mismatched and semimismatched unlabeled training sets, the SSSGAN and VSSSGAN are more robust than the SSGAN because of the distributional smoothness.

查看译文

关键词

Semisupervised learning, generative adversarial network, adversarial training, speech emotion recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要