Towards Physically Safe Reinforcement Learning under Supervision
arXiv: Learning, Volume abs/1901.06576, 2019.
This paper addresses the question of how a previously available control policy $pi_s$ can be used as a supervisor to more quickly and safely train a new learned control policy $pi_L$ for a robot. A weighted average of the supervisor and learned policies is used during trials, with a heavier weight initially on the supervisor, in order to ...More
PPT (Upload PPT)