Incompatibility between Deterministic Policy and Generative Adversarial Imitation Learning

Wanying Wang,Yirui Zhou,Chaomin Shen,Yangchun Zhang,Jian Tang,Zhiyuan Xu,Yaxin Peng

ICLR 2023（2023）

引用 0|浏览62

暂无评分

摘要

Deterministic policies are widely applied in generative adversarial imitation learning (GAIL). When adopting these policies, some GAIL variants modify the reward function to avoid training instability. However, the mechanism behind this instability is still largely unknown. In this paper, we capture the instability through the underlying exploding gradients theoretically in the updating process. Our novelties lie in the following aspects: 1) By employing multivariate Gaussian policy with small covariance to approximate deterministic policy, we establish and prove the probabilistic lower bound for the exploding gradients, which can describe the degree of instability universally, while the stochastic policy will never suffer from such pathology subsequently. 2) We also prove that the modified reward function of adversarial inverse reinforcement learning (AIRL) can relieve exploding gradients, but at the expense of ``non-confidence''. Experiments and a toy demo support our analysis.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要