An algorithm and user study for teaching bilateral manipulation via iterated best response demonstrations

2017 13th IEEE Conference on Automation Science and Engineering (CASE)(2017)

引用 4|浏览50
暂无评分
摘要
Human demonstrations can be valuable for teaching robots to perform manipulation and coordination tasks. However, it can be difficult for human supervisors to provide demonstrations for multilateral (multi-arm) tasks, which require divided attention. In this paper, we propose a new algorithm called Bilateral Iterated Best Response (BIBR), which builds on the game-theoretic concept of Iterated Best Response. This algorithm allows a supervisor to train each manipulator iteratively, thereby reducing supervisor burden and improving the quality of demonstrations. We present a web-based user study of 51 participants controlling two agents in a GridWorld environment with a keyboard interface. We confirm prior work that bilateral demonstrations are noisier and longer than demonstrations provided separately for either manipulator when the task is asymmetric. As unilateral demonstrations lack coordination, this paper proposes learning coordinated bilateral policies from unilateral demonstrations by rolling out an estimated robot policy for one arm while the human demonstrates for the other, iteratively updating the estimated policy. Compared to a bilateral demonstration baseline, BIBR improves the success rate of the learned policy from 29.17% to 55.55% in the asymmetric task in the first full round of demonstrations. Furthermore, these policies learn trajectories that have 8.63% fewer steps and smoother trajectories, i.e., have 44.15% fewer changes in direction.
更多
查看译文
关键词
user study,iterated best response demonstrations,human demonstrations,human supervisors,Bilateral Iterated Best Response,manipulator,supervisor burden,bilateral demonstrations,unilateral demonstrations,coordinated bilateral policies,estimated robot policy,bilateral demonstration baseline,asymmetric task,bilateral manipulation,BIBR,game-theoretic concept,keyboard interface,learned policy,smoother trajectories
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要