THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation
CoRR(2024)
摘要
To realize effective large-scale, real-world robotic applications, we must
evaluate how well our robot policies adapt to changes in environmental
conditions. Unfortunately, a majority of studies evaluate robot performance in
environments closely resembling or even identical to the training setup. We
present THE COLOSSEUM, a novel simulation benchmark, with 20 diverse
manipulation tasks, that enables systematical evaluation of models across 12
axes of environmental perturbations. These perturbations include changes in
color, texture, and size of objects, table-tops, and backgrounds; we also vary
lighting, distractors, and camera pose. Using THE COLOSSEUM, we compare 4
state-of-the-art manipulation models to reveal that their success rate degrades
between 30-50
are applied in unison, the success rate degrades ≥75
changing the number of distractor objects, target object color, or lighting
conditions are the perturbations that reduce model performance the most. To
verify the ecological validity of our results, we show that our results in
simulation are correlated (R̅^2 = 0.614) to similar perturbations in
real-world experiments. We open source code for others to use THE COLOSSEUM,
and also release code to 3D print the objects used to replicate the real-world
perturbations. Ultimately, we hope that THE COLOSSEUM will serve as a benchmark
to identify modeling decisions that systematically improve generalization for
manipulation. See https://robot-colosseum.github.io/ for more details.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要