Classical Policy Gradient: Preserving Bellman's Principle of Optimality

Scott M. Jordan
Scott M. Jordan
Yash Chandak
Yash Chandak
Chris Nota
Chris Nota
James Kostas
James Kostas

CoRR, 2019.

Cited by: 0|Bibtex|Views18
EI
Other Links: dblp.uni-trier.de|arxiv.org

Abstract:

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments