Language-Based Augmentation to Address Shortcut Learning in Object Goal Navigation
CoRR(2024)
摘要
Deep Reinforcement Learning (DRL) has shown great potential in enabling
robots to find certain objects (e.g., `find a fridge') in environments like
homes or schools. This task is known as Object-Goal Navigation (ObjectNav). DRL
methods are predominantly trained and evaluated using environment simulators.
Although DRL has shown impressive results, the simulators may be biased or
limited. This creates a risk of shortcut learning, i.e., learning a policy
tailored to specific visual details of training environments. We aim to deepen
our understanding of shortcut learning in ObjectNav, its implications and
propose a solution. We design an experiment for inserting a shortcut bias in
the appearance of training environments. As a proof-of-concept, we associate
room types to specific wall colors (e.g., bedrooms with green walls), and
observe poor generalization of a state-of-the-art (SOTA) ObjectNav method to
environments where this is not the case (e.g., bedrooms with blue walls). We
find that shortcut learning is the root cause: the agent learns to navigate to
target objects, by simply searching for the associated wall color of the target
object's room. To solve this, we propose Language-Based (L-B) augmentation. Our
key insight is that we can leverage the multimodal feature space of a
Vision-Language Model (VLM) to augment visual representations directly at the
feature-level, requiring no changes to the simulator, and only an addition of
one layer to the model. Where the SOTA ObjectNav method's success rate drops
69
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要