Policy Shaping With Supervisory Attention Driven Exploration

2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS)(2018)

引用 13|浏览25
暂无评分
摘要
Robots deployed for long periods of time need to be able to explore and learn from their environment. One approach to this problem has been reinforcement learning (RL), in which robots receive rewards from the environment that allow them to choose optimal actions. To speed learning when human supervision is available, interactive reinforcement learning solicits feedback from a human teacher. However, this approach typically assumes that learning takes place under continuous supervision, which is unlikely to hold in long-term scenarios. We propose an extension to a method of interactive reinforcement learning, policy shaping, that takes into account human attention. Our approach enables better performance while unattended by favoring information-gathering actions when attended and actions that have received positive feedback when unattended. We test our approach in both simulation and on a robot, finding that our method learns faster than policy shaping and performs more safely than policy shaping while no one is paying attention to the robot.
更多
查看译文
关键词
policy shaping,robots,human supervision,human teacher,information-gathering actions,interactive reinforcement learning,interactive RL
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络