Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting
arxiv(2024)
摘要
We present Splat-MOVER, a modular robotics stack for open-vocabulary robotic
manipulation, which leverages the editability of Gaussian Splatting (GSplat)
scene representations to enable multi-stage manipulation tasks. Splat-MOVER
consists of: (i) ASK-Splat, a GSplat representation that distills latent codes
for language semantics and grasp affordance into the 3D scene. ASK-Splat
enables geometric, semantic, and affordance understanding of 3D scenes, which
is critical for many robotics tasks; (ii) SEE-Splat, a real-time scene-editing
module using 3D semantic masking and infilling to visualize the motions of
objects that result from robot interactions in the real-world. SEE-Splat
creates a "digital twin" of the evolving environment throughout the
manipulation task; and (iii) Grasp-Splat, a grasp generation module that uses
ASK-Splat and SEE-Splat to propose candidate grasps for open-world objects.
ASK-Splat is trained in real-time from RGB images in a brief scanning phase
prior to operation, while SEE-Splat and Grasp-Splat run in real-time during
operation. We demonstrate the superior performance of Splat-MOVER in hardware
experiments on a Kinova robot compared to two recent baselines in four
single-stage, open-vocabulary manipulation tasks, as well as in four
multi-stage manipulation tasks using the edited scene to reflect scene changes
due to prior manipulation stages, which is not possible with the existing
baselines. Code for this project and a link to the project page will be made
available soon.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要