Weakly Supervised Neuro-Symbolic Image Manipulation via Multi-Hop Complex Instructions

ICLR 2023(2023)

引用 0|浏览35
暂无评分
摘要
We are interested in image manipulation via natural language text – a task that is extremely useful for multiple AI applications but requires complex reasoning over multi-modal spaces. Recent work on neuro-symbolic approaches (Mao et al., 2019) (NSCL) has been quite effective for solving VQA as they offer better modularity, interpretability, and generalizability. We extend NSCL for the image manipulation task and propose a solution referred to as NeuroSIM. Previous work either requires supervised training data in the form of manipulated images or can only deal with very simple reasoning instructions over single object scenes. In contrast, NeuroSIM can perform complex multi-hop reasoning over multi-object scenes and only requires weak supervision in the form of annotated data for VQA. NeuroSIM parses an instruction into a symbolic program, based on a Domain Specific Language (DSL) comprising of object attributes and manipulation operations, that guides the manipulation. We design neural modules for manipulation, as well as novel loss functions that are capable of testing the correctness of manipulated object and scene graph representations via query networks trained merely on VQA data. An image decoder is trained to render the final image from the manipulated scene graph. Extensive experiments demonstrate that NeuroSIM, without using target images as supervision, is highly competitive with SOTA baselines that make use of supervised data for manipulation.
更多
查看译文
关键词
Neuro-Symbolic Reasoning,Natural Language Guided Image Manipulation,Visual Question Answering,Weakly Supervised Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要