Chrome Extension
WeChat Mini Program
Use on ChatGLM

A Maximum Divergence Approach to Optimal Policy in Deep Reinforcement Learning.

IEEE transactions on cybernetics(2023)

Cited 8|Views78
No score
Abstract
Model-free reinforcement learning algorithms based on entropy regularized have achieved good performance in control tasks. Those algorithms consider using the entropy-regularized term for the policy to learn a stochastic policy. This work provides a new perspective that aims to explicitly learn a representation of intrinsic information in state transition to obtain a multimodal stochastic policy, for dealing with the tradeoff between exploration and exploitation. We study a class of Markov decision processes (MDPs) with divergence maximization, called divergence MDPs. The goal of the divergence MDPs is to find an optimal stochastic policy that maximizes the sum of both the expected discounted total rewards and a divergence term, where the divergence function learns the implicit information of state transition. Thus, it can provide better-off stochastic policies to improve both in robustness and performance in a high-dimension continuous setting. Under this framework, the optimality equations can be obtained, and then a divergence actor–critic algorithm is developed based on the divergence policy iteration method to address large-scale continuous problems. The experimental results, compared to other methods, show that our approach achieved better performance and robustness in the complex environment particularly. The code of DivAC can be found in https://github.com/yzyvl/DivAC .
More
Translated text
Key words
Entropy,Reinforcement learning,Task analysis,Robustness,Robots,Predictive models,Markov processes,Actor-critic algorithm,continuous domains,divergence Markov decision processes (MDPs),optimality conditions,reinforcement learning (RL)
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined