TeFNA: Text-centered Fusion Network with crossmodal Attention for multimodal sentiment analysis

Knowledge-Based Systems(2023)

引用 24|浏览135
暂无评分
摘要
Multimodal sentiment analysis (MSA), which goes beyond the analysis of texts to include other modalities such as audio and visual data, has attracted a significant amount of attention. An effective fusion of sentiment information in multiple modalities is key to improving the performance of MSA. However, aligning multiple modalities during the process of fusion faces challenges such as maintaining modal-specific information. This paper proposes a Text-centered Fusion Network with crossmodal Attention (TeFNA), a multimodal fusion network that uses crossmodal attention to model unaligned multimodal timing information. In particular, TeFNA employs a Text-Centered Aligned fusion method (TCA) that takes text modality as the primary modality to improve the representation of fusion features. In addition, TeFNA maximizes the mutual information between modality pairs to maintain task-related emotional information, thereby ensuring that the key information of modalities from input to fusion is preserved. The results of our comprehensive experiments on the multimodal datasets of CMU-MOSI and CMU-MOSEI show that our proposed model outperforms methods in terms of most metrics used.
更多
查看译文
关键词
Text-centered,Fusion network,Crossmodal attention,Multimodal sentiment analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要