Optimizing over a Restricted Policy Class in MDPs
international conference on artificial intelligence and statistics, 2019.
We address the problem of finding an optimal policy in a Markov decision process (MDP) under a restricted policy class defined by the convex hull of a set of base policies. This problem is of great interest in applications in which a number of reasonably good (or safe) policies are already known and we are interested in optimizing in thei...More
PPT (Upload PPT)