Counterfactual Scenario-relevant Knowledge-enriched Multi-modal Emotion Reasoning

ACM Transactions on Multimedia Computing, Communications, and Applications(2023)

Cited 0|Views173
No score
Multi-modal video emotion reasoning (MERV) has recently attracted increasing attention due to its potential application in human-computer interaction. This task needs to not only recognize utterance-level emotions for conspicuous speakers, but also perceive the emotions of non-speakers in videos. Existing methods focus on modeling multi-modal multi-level contexts to capture emotion-relevant clues from the complex scenarios in videos. However, the context information is far from enough to infer the emotion labels of non-speakers due to the large gap between the scenario situation and emotions labels. Inspired by the observation that humans can find solutions to complex problems with the leverage of experience and knowledge, we propose SK-MER , a Scenario-relevant Knowledge-enhanced Multi-modal Emotion Reasoning framework for MERV task, which can leverage external knowledge to enhance the video scenario understanding and emotion reasoning. Specifically, we use scenario concepts extracted from videos to build knowledge subgraphs from external knowledge bases. The knowledge subgraphs are then utilized to obtain scenario-relevant knowledge representations through dynamic knowledge graph attention. Next, we incorporate the knowledge representations into context modeling to enhance emotion reasoning with external scenario-relevant knowledge. In addition, we propose a counterfactual knowledge representation learning approach to obtain more effective scenario-relevant knowledge representations. Extensive experimental results on MEmoR dataset show that the proposed SK-MER framework achieves new state-of-the-art results.
Translated text
Key words
Neural networks, emotion reasoning, knowledge enhancement, counterfactual
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined