External Knowledge Dynamic Modeling for Image-text Retrieval

MM '23: Proceedings of the 31st ACM International Conference on Multimedia(2023)

引用 0|浏览19
暂无评分
摘要
Image-text retrieval is a fundamental branch in cross-modal retrieval. The core is to explore the semantic correspondence to align relevant image-text pairs. Some existing methods rely on global semantics and co-occurrence frequency to design knowledge introduction patterns for consistent representations. However, they lack flexibility due to the limitations of fixed information and empirical feedback. To address these issues, we develop an External Knowledge Dynamic Modeling~(EKDM) architecture based on the filtering mechanism, which dynamically explores different knowledge towards varied image-text pairs. Specially, we first capture abundant concepts and relationships from external knowledge to construct visual and textual corpus sets. Then, we progressively explores concepts related to images and texts by dynamic global representations. To endow the model with the capability of relationship decision, we integrate the variable spatial locations between objects for association exploration. Since the filtering mechanism is conditioned on dynamic semantics and variable spatial locations, our model can dynamically model different knowledge for different image-text pairs. Extensive experimental results on two benchmark datasets demonstrate the effectiveness of our proposed method.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要