PropTest: Automatic Property Testing for Improved Visual Programming
arxiv(2024)
摘要
Visual Programming has emerged as an alternative to end-to-end black-box
visual reasoning models. This type of methods leverage Large Language Models
(LLMs) to decompose a problem and generate the source code for an executable
computer program. This strategy has the advantage of offering an interpretable
reasoning path and does not require finetuning a model with task-specific data.
We propose PropTest, a general strategy that improves visual programming by
further using an LLM to generate code that tests for visual properties in an
initial round of proposed solutions. Particularly, our method tests for
data-type consistency, as well as syntactic and semantic properties in the
generated solutions. Our proposed solution outperforms baselines and achieves
comparable results to state-of-the-art methods while using smaller and publicly
available LLMs (CodeLlama-7B and WizardCoder-15B). This is demonstrated across
different benchmarks on visual question answering and referring expression
comprehension, showing the efficacy of our approach in enhancing the
performance and generalization of visual reasoning tasks. Specifically,
PropTest improves ViperGPT by obtaining 48.66
benchmark and 52.8
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要