Prescribed Safety Performance Imitation Learning from A Single Expert Dataset.

IEEE transactions on pattern analysis and machine intelligence(2023)

引用 0|浏览23
Existing safe imitation learning (safe IL) methods mainly focus on learning safe policies that are similar to expert ones, but may fail in applications requiring different safety constraints. In this paper, we propose the Lagrangian Generative Adversarial Imitation Learning (LGAIL) algorithm, which can adaptively learn safe policies from a single expert dataset under diverse prescribed safety constraints. To achieve this, we augment GAIL with safety constraints and then relax it as an unconstrained optimization problem by utilizing a Lagrange multiplier. The Lagrange multiplier enables explicit consideration of the safety and is dynamically adjusted to balance the imitation and safety performance during training. Then, we apply a two-stage optimization framework to solve LGAIL: (1) a discriminator is optimized to measure the similarity between the agent-generated data and the expert ones; (2) forward reinforcement learning is employed to improve the similarity while considering safety concerns enabled by a Lagrange multiplier. Furthermore, theoretical analyses on the convergence and safety of LGAIL demonstrate its capability of adaptively learning a safe policy given prescribed safety constraints. At last, extensive experiments in OpenAI Safety Gym conclude the effectiveness of our approach.
AI 理解论文