Scanning, attention, and reasoning multimodal content for sentiment analysis

KNOWLEDGE-BASED SYSTEMS(2023)

引用 2|浏览46
暂无评分
摘要
The rise of social networks has provided people with platforms to display their lives and emotions, often in multimodal forms such as images and descriptive texts. Capturing the emotions embedded in the multimodal content of social networks involves great research challenges and practical values. Existing methods usually make sentiment predictions based on a single-round reasoning process with multimodal attention networks, however, this may be insufficient for tasks that require deep understanding and complex reasoning. To effectively comprehend multimodal content and predict the correct sentiment tendencies, we propose the Scanning, Attention, and Reasoning (SAR) model for multimodal sentiment analysis. Specifically, a perceptual scanning model is designed to roughly perceive the image and text content, as well as the intrinsic correlation between them. To deeply understand the complementary features between images and texts, an intensive attention model is proposed for cross-modal feature association learning. The multimodal joint features from the scanning and attention models are fused together as the representation of a multimodal node in the social network. A heterogeneous reasoning model implemented with a graph neural network is constructed to capture the influence of network communication in social networks and make sentiment predictions. Extensive experiments conducted on three benchmark datasets confirm the effectiveness and superiority of our model compared with state-of-the-art methods.(c) 2023 Elsevier B.V. All rights reserved.
更多
查看译文
关键词
multimodal content,attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要