Stable Control Policy and Transferable Reward Function via Inverse Reinforcement Learning.


Cited 0|Views13
No score
Inverse reinforcement learning (IRL) can solve the problem of complex reward function shaping by learning from expert data. However, it is challenging to train when the expert data is insufficient, and its stability is difficult to guarantee. Moreover, the reward function of mainstream IRL can only adapt to subtle environmental changes. It cannot be directly transferred to a similar task scenario, so the generalization ability still needs to be improved. To address these issues, we propose an IRL algorithm to obtain a stable control policy and transferable reward function (ST-IRL). Firstly, by introducing the Wasserstein metric and adversarial training, we solve the problem that IRL is challenging to train in a new environment with little expert data. Secondly, we add state marginal matching (SMM), hyperparameter comparison and optimizer evaluation to address the model's generalisability problem. As a result, the control policy obtained by ST-IRL achieves outstanding control results in all four Mujoco benchmarks. Furthermore, in both the custom Ant and PointMaze environments, the reward function obtained by our algorithm exhibits promising transferability.
Translated text
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined