Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection

MM '20: The 28th ACM International Conference on Multimedia Seattle WA USA October, 2020(2020)

引用 44|浏览97
暂无评分
摘要
Multi-modal utterance-level emotion detection has been a hot research topic in both multi-modal analysis and natural language processing communities. Different from traditional single-label multi-modal sentiment analysis, typical multi-modal emotion detection is naturally a multi-label problem where an utterance often contains multiple emotions. Existing studies normally focus on multi-modal fusion only and transform multi-label emotion classification into multiple binary classification problem independently. As a result, existing studies largely ignore two kinds of important dependency information: (1) Modality-to-label dependency, where different emotions can be inferred from different modalities, that is, different modalities contribute differently to each potential emotion. (2) Label-to-label dependency, where some emotions are more likely to coexist than those conflicting emotions. To simultaneously model above two kinds of dependency, we propose a unified approach, namely multi-modal emotion set generation network (MESGN) to generate an emotion set for an utterance. Specifically, we first employ a cross-modal transformer encoder to capture cross-modal interactions among different modalities, and a standard transformer encoder to capture temporal information for each modality-specific sequence given previous interactions. Then, we design a transformer-based discriminative decoding module equipped with modality-to-label attention to handle the modality-to-label dependency. In the meanwhile, we employ a reinforced decoding algorithm with self-critic learning to handle the label-to-label dependency. Finally, we validate the proposed MESGN architecture on a word-level aligned and unaligned multi-modal dataset. Detailed experimentation shows that our proposed MESGN architecture can effectively improve the performance of multi-modal multi-label emotion detection.
更多
查看译文
关键词
multi-modal, multi-label, emotion detection, transformer-based, label set generation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要