SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
CoRR(2024)
摘要
Linguistic ambiguity is ubiquitous in our daily lives. Previous works adopted
interaction between robots and humans for language disambiguation.
Nevertheless, when interactive robots are deployed in daily environments, there
are significant challenges for natural human-robot interaction, stemming from
complex and unpredictable visual inputs, open-ended interaction, and diverse
user demands. In this paper, we present SInViG, which is a self-evolving
interactive visual agent for human-robot interaction based on natural
languages, aiming to resolve language ambiguity, if any, through multi-turn
visual-language dialogues. It continuously and automatically learns from
unlabeled images and large language models, without human intervention, to be
more robust against visual and linguistic complexity. Benefiting from
self-evolving, it sets new state-of-the-art on several interactive visual
grounding benchmarks. Moreover, our human-robot interaction experiments show
that the evolved models consistently acquire more and more preferences from
human users. Besides, we also deployed our model on a Franka robot for
interactive manipulation tasks. Results demonstrate that our model can follow
diverse user instructions and interact naturally with humans in natural
language, despite the complexity and disturbance of the environment.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要