Affective Video Content Analysis via Multimodal Deep Quality Embedding Network

IEEE Transactions on Affective Computing(2022)

引用 5|浏览75
暂无评分
摘要
The establishment of large video affective content analysis datasets, such as LIRIS-ACCEDE, opens up the possibility of utilizing the massive representation power of deep neural networks (DNNs) to model the complex process of eliciting affective responses from video viewers. However, label noise in these datasets poses a considerable challenge to both the training and evaluation of DNNs. The optimization of DNNs requires stochastic gradient descent (SGD), but label noise in the training set leads to an inaccurate estimate of the gradient, which may cause the model to converge to a nonoptima. In addition, label noise in the test set renders the results of model evaluation untrustworthy. In this article, we propose a multimodal deep quality embedding network (MMDQEN) for affective video content analysis. Specifically, MMDQEN can infer the latent label and label quality from the noisy training samples so that cleaner supervision signals are provided to the DNN-based affective classifier, and a tractable objective for MMDQEN is derived with variational inference and conditional independence assumption. In addition, to avoid model evaluation bias incurred by the annotation noise in the test set, new test sets based on the original LIRIS-ACCEDE database, which we name LIRIS-ACCEDE-RANK, are established where the samples are ranked according to their label uncertainty level, with corresponding evaluation metrics introduced accordingly to further reveal the performance of different models. Experiments conducted on both the LIRIS-ACCEDE and the LIRIS-ACCEDE-RANK datasets demonstrate the effectiveness of the proposed method.
更多
查看译文
关键词
Affective video content analysis,label noise,probabilistic graphical models,variational inference,deep neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要