Fact-sentiment incongruity combination network for multimodal sarcasm detection

INFORMATION FUSION(2024)

引用 0|浏览56
暂无评分
摘要
Multimodal sarcasm detection aims to identify whether the literal expression is contrary to the authentic attitude within multimodal data. Sarcasm incongruity method has been successfully applied to multimodal sarcasm detection, due to its ability to flexibly capture the intrinsic differences between modalities. However, previous incongruity methods primarily focused on the semantic level, often overlooking more specific forms of sarcasm incongruity. Sarcasm incongruity, in particular, encompasses fact incongruity, sentiment incongruity, and combination incongruity. Therefore, we propose a fact-sentiment incongruity combination network from a novel perspective, which draws the multimodal sarcastic relations by exploring the multimodal factual disparities, sentiment incongruity, and combination fusion. Specifically, we design a dynamic connecting component calculating dynamic routing probability weights via graph attention and mask routing matrices, which selects the most suitable image-text pairs to capture fact incongruity between images and text. Then, we retrieve sentiment relations between text tokens and image objects using external sentiment knowledge to reconstruct edge weights in the cross-modal graph matrix to capture sentiment incongruity. Furthermore, we introduce a combination incongruity fusion layer and cross-modal contrastive loss to fuse fact incongruity and sentiment incongruity for further enhancing the incongruity representations. Extensive experiments and further analyses on publicly available datasets demonstrate the superiority of our proposed model.
更多
查看译文
关键词
Multimodal sarcasm detection,Sarcasm incongruity,Dynamic connecting component,Cross-modal graph,Combination incongruity fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要