Towards Open-World Interactive Disambiguation for Robotic Grasping.

ICRA(2023)

引用 7|浏览3
暂无评分
摘要
Language-based communications are essential in human-robot interaction, especially for the majority of nonexpert users. In this paper, we present SeeAsk, an openworld interactive visual grounding system to grasp specified targets with ambiguous natural language instructions. The main contribution of SeeAsk is that it can robustly handle open-world scenes in terms of both open-set objects and open-vocabulary interactions. Specifically, our SeeAsk is built upon modern large-scale vision-language pre-trained models and traditional decision-making process, and shows promising results to be deployed in real-world scenarios. SeeAsk outperforms previous state-of-the-art algorithms with a clear margin in terms of not only success rate but also asking smarter and more informative questions. User studies also demonstrate its advantages over previous works.
更多
查看译文
关键词
ambiguous natural language instructions,human-robot interaction,language-based communications,large-scale vision-language pre-trained models,nonexpert users,open-vocabulary interactions,open-world interactive visual grounding system,open-world scenes,robotic grasping,SeeAsk,specified targets,towards open-world interactive disambiguation,traditional decision-making process
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要