Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
NIPS 2020, 2020.
The standard formulation for reinforcement learning with partial observability is the Partially Observable Markov Decision Process, in which an agent operating on noisy observations makes decisions that influence the evolution of a latent state
Partial observability is a common challenge in many reinforcement learning applications, which requires an agent to maintain memory, infer latent states, and integrate this past information into exploration. This challenge leads to a number of computational and statistical hardness results for learning general Partially Observable Marko...More
PPT (Upload PPT)