Relational Graph-Bridged Image-Text Interaction: A Novel Method for Multi-Modal Relation Extraction

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览22
暂无评分
摘要
Multi-modal relation extraction (MRE) requires the integration of multi-modal information to identify relationships between entities. Although fine-grained correlations between visual objects and textual words have the potential to improve cross-modal interaction, they are typically modeled implicitly and hindered by the modality gap. This paper introduces a novel method called relational Graph-Bridged cross-modal InTeraction (GBIT). GBIT aims to model fine-grained cross-modal correlations into the interaction process explicitly. This is achieved by constructing a fine-grained cross-modal relational graph, which acts as a bridge for effective cross-modal interaction in multiple layers. Within GBIT, a gated interaction strategy and an adaptive integration module are proposed for irrelevance-filtered information exchange and final information collation. Through extensive experiments on the benchmark MRE, we demonstrate the superiority of our proposed method for MRE.
更多
查看译文
关键词
Multimedia,Relation Extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要