You should know more: Learning external knowledge for visual dialog

Lei Zhao,Haonan Zhang,Xiangpeng Li, Sen Yang,Yuanfeng Song

Neurocomputing（2022）

引用 1|浏览36

暂无评分

摘要

Visual dialog is a task that two agents complete a multi-round conversation based on an image, a caption, and dialog histories. Despite the recent progress, existing methods still undergo degradation on the condition of complex scenarios. Handling these scenarios depends on logical reasoning that requires commonsense priors. In this paper, we propose a novel visual dialog pipeline named Structured Knowledge-Aware Network (SKANet), consisting of an Image Knowledge-Aware Module and a Caption Knowledge-Aware Module. Specifically, the Image and Caption Knowledge-Aware Modules construct commonsense knowledge graphs from ConceptNet. We apply SKANet to two sub-tasks: the conventional visual dialog and a goal-oriented visual dialog named ‘image guessing’. For the conventional visual dialog, the SKANet is combined with an additional Multi-Modality Fusion Module, which is designed to explore the visual content and the textual context about the dialog history. For the goal-oriented visual dialog, we directly apply the Image and Caption Knowledge-Aware Modules to two agents, respectively. Experimental results on VisDial v0.9 and VisDial v1.0 datasets show that our proposed method effectively outperforms comparative methods on both sub-tasks.

查看译文

关键词

External knowledge,Graph convolutional network,Structured knowledge graph,Attention mechanism,Visual dialog

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要