Explaining Fast Improvement in Online Policy Optimization

Cited by: 1|Bibtex|Views18
Other Links: arxiv.org

Abstract:

Online policy optimization (OPO) views policy optimization for sequential decision making as an online learning problem. In this framework, the algorithm designer defines a sequence of online loss functions such that the regret rate in online learning implies the policy convergence rate and the minimal loss witnessed by the policy class...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments