A Maximum Divergence Approach to Optimal Policy in Deep Reinforcement Learning.

Zhiyou Yang,Hong Qu,Mingsheng Fu,Wang Hu,Yongze Zhao

IEEE transactions on cybernetics（2023）

Cited 8|Views78

No score

Abstract

Model-free reinforcement learning algorithms based on entropy regularized have achieved good performance in control tasks. Those algorithms consider using the entropy-regularized term for the policy to learn a stochastic policy. This work provides a new perspective that aims to explicitly learn a representation of intrinsic information in state transition to obtain a multimodal stochastic policy, for dealing with the tradeoff between exploration and exploitation. We study a class of Markov decision processes (MDPs) with divergence maximization, called divergence MDPs. The goal of the divergence MDPs is to find an optimal stochastic policy that maximizes the sum of both the expected discounted total rewards and a divergence term, where the divergence function learns the implicit information of state transition. Thus, it can provide better-off stochastic policies to improve both in robustness and performance in a high-dimension continuous setting. Under this framework, the optimality equations can be obtained, and then a divergence actor–critic algorithm is developed based on the divergence policy iteration method to address large-scale continuous problems. The experimental results, compared to other methods, show that our approach achieved better performance and robustness in the complex environment particularly. The code of DivAC can be found in https://github.com/yzyvl/DivAC .

Translated text

Key words

Entropy,Reinforcement learning,Task analysis,Robustness,Robots,Predictive models,Markov processes,Actor-critic algorithm,continuous domains,divergence Markov decision processes (MDPs),optimality conditions,reinforcement learning (RL)

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined