Hybrid Representation and Decision Fusion towards Visual-textual Sentiment

ACM Transactions on Intelligent Systems and Technology(2023)

引用 2|浏览9
暂无评分
摘要
The rising use of online media has changed social customs of the public. Users have become gradually accustomed to sharing daily experiences and publishing personal opinions on social networks. Social data carrying with emotions and attitudes have provided significant decision support for numerous tasks in sentiment analysis. Conventional sentiment analysis methods only concern about textual modality and are vulnerable to the multimodal scenario, while common multimodal approaches only focus on the interactive relationship between modalities without considering unique intra-modal information. A hybrid fusion network is proposed in this work to capture both the inter-modal and intra-modal features. First, in the intermediate fusion stage, a multi-head visual attention is proposed to extract accurate semantic and sentimental information from textual embedding representations with the assistance of visual features. Then, multiple base classifiers are trained to learn independent and diverse discriminative information from different modal representations in the late fusion stage. The final decision is determined based on fusing the decision supports from base classifiers via a decision fusion method. To improve the generalization of our hybrid fusion network, a similarity loss is employed to inject decision diversity into the whole model. Empirical results on multimodal datasets have demonstrated the proposed model achieves a higher accuracy and better generalization compared with baselines for multimodal sentiment analysis.
更多
查看译文
关键词
Decision fusion,multimodal,representation fusion,social network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要