REVE-CE: Remote Embodied Visual Referring Expression in Continuous Environment

IEEE ROBOTICS AND AUTOMATION LETTERS(2022)

引用 8|浏览40
暂无评分
摘要
It has always been a great challenge for the robot to navigate in the visual world following natural language instructions. Recently, several tasks such as the Vision-and-Language Navigation (VLN) and Remote Embodied Visual Referring Expression in Real Indoor Environments (REVERIE) are proposed trying to solve this challenge. And the most significant difference between VLN and REVERIE tasks is that REVERIE uses a higher guidance level instruction. However, the navigation process of REVERIE is implemented in a discrete environment, which is unrealistic in real world scenarios. To make the REVERIE task more consistent with the real physical world, we develop a new task of Remote Embodied Visual Referring Expression in Continuous Environment, namely REVE-CE, in which the agent executes a much longer sequence of low-level actions given language instructions. Furthermore, we propose a multi-branch cross modal attention (MBCMA) framework to solve the proposed REVE-CE task. Extensive experiments are conducted demonstrating that the proposed framework greatly outperforms the state-of-the-art VLN baselines and a new benchmark for the proposed REVE-CE task is built.
更多
查看译文
关键词
Deep learning for visual perception, embodied cognitive science, perception-action coupling, vision-based navigation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要