Context-Adapted Multi-policy Ensemble Method for Generalization in Reinforcement Learning.

Tingting Xu,Fengge Wu,Junsuo Zhao

ICONIP (1)(2022)

Cited 0|Views10
No score
Generalizability is a formidable challenge in applying reinforcement learning to the real world. The root cause of poor generalization performance in reinforcement learning is that generalization from a limited number of training conditions to unseen test conditions results in implicit partial observability, effectively transforming even fully observed Markov Decision Process (MDP) into Partially Observable Markov Decision Process (POMDP). To address such issues, we propose a novel structure, namely Context-adapted Multi-policy Ensemble Method (CAMPE), which enables the model to adapt to changes in the environment and efficiently solve implicit partial observability during generalization. The method captures local dynamic changes by learning contextual environment latent variables to equip the model with the ability of environment adaption. The latent variables and samples with contextual information are used as the input of the policy. Multiple policies are trained, combined in an integrated way to obtain a single policy to approximately solve the problem of partial observability. We demonstrate our method on various simulated robotics and control tasks. Experimental results show that our method achieves superior generalization ability.
Translated text
Key words
ensemble method,reinforcement learning,generalization,context-adapted,multi-policy
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined