DRAKE: Deep Pair-Wise Relation Alignment for Knowledge-Enhanced Multimodal Scene Graph Generation in Social Media Posts

IEEE Transactions on Circuits and Systems for Video Technology(2023)

引用 1|浏览20
暂无评分
摘要
Scene Graph Generation (SGG) is a typical computer vision task that detects objects and corresponding predicates in an image. Existing SGG methods focus on modeling visual contexts to generate scene graphs and are conducted on well-annotated datasets with high-quality images. However, the quality is unguaranteed for images in social media posts, so that some images may be incomplete or occluded by some obstacles, hence might not provide sufficient visual context for SGG. Therefore, previous methods might result in missing or false visual relationship detection due to lacking visual contexts. To effectively generate the scene graphs in social media, we study multimodal scene graph generation (MSG) in this paper. MSG aims to develop visual scene graphs from images in social media posts with the support of text sentences. However, leveraging textual contents by simple multimodal alignment such as object-level alignment neglects the inherent pair-wise mapping between multimodal object pairs. To address the limitations, we propose a method named Deep pair-wise Relation Alignment for Knowledge-Enhanced (DRAKE) multimodal scene graph generation. The model supplements the missing visual contexts with well-aligned textual knowledge. It first represents the textual information into object-aware knowledge representation with the help of vision data. Furthermore, our proposed DRAKE facilitates the interaction of the info between multimodal pair-wise representations. A multimodal context enhancement layer can be devised to help the model generate the scene graph. To evaluate the model performance of SGG on social media images, we propose a social media SGG dataset called MSG. We comprehensively analyze the effectiveness of our proposed method on the MSG dataset. The experimental results on the MSG dataset indicate that our model outperforms the previous methods. To fairly compare our method with other SGG models, we also conduct experiments on the Visual Genome dataset for more analysis The MSG dataset is released on https://github.com/FuZe4ever/MSG.
更多
查看译文
关键词
Scene graph generation,social media posts,knowledge enhancement,pair-wise alignment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要