Two-Stage Safe Reinforcement Learning for High-Speed Autonomous Racing.

SMC(2020)

引用 9|浏览40
暂无评分
摘要
Decision making for autonomous driving is a safety-critical control problem. Prior works of safe reinforcement learning either tackle the problem with reward shaping or with modifying the reinforcement learning exploration process. However, the former cannot guarantee the safety during the learning process, while the latter relies heavily on expertise to design exquisite exploration policy. Currently, only short-term decision makings for low-speed driving were achieved in road scenes with basic geometries. In this paper, we propose a two-stage safe reinforcement learning algorithm to automatically learn a long-term policy for high-speed driving that guarantees safety during the entire training. In the first learning stage, model-free reinforcement learning is followed by a rule-based safeguard module to avoid danger at low speed without expert fine-tuning. In the second learning stage, the rule-based module is replaced with a data-driven counterpart to develop a closed-form analytical safety solution for high-speed driving. Moreover, an adaptive reward function is designed to match the different objectives of the two learning stages for faster convergence to an optimal policy. Experiments are conducted on a racing simulator TORCS which has complex racing tracks (e.g. sharp turns, hills). Compared with the state-of-the-art baselines, the results show that our method achieves zero safety violation and quickly converges to a more efficient and stable policy with an average speed of 127 km/h (3.3% higher than the best result of baselines) and an average swing of 3.96 degrees.
更多
查看译文
关键词
safe reinforcement learning,autonomous racing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要