Fast Adaptation via Policy-Dynamics Value Functions

Max Goldstein
Max Goldstein
Arthur Szlam
Arthur Szlam
被引用3|浏览8

摘要

Standard RL algorithms assume fixed environment dynamics and require a significant amount of interaction to adapt to new environments. We introduce Policy-Dynamics Value Functions (PD-VF), a novel approach for rapidly adapting to dynamics different from those previously seen in training. PD-VF explicitly estimates the cumulative reward ...更多

代码

数据

ZH
下载 PDF 全文
引用
您的评分 :
0

 

标签
评论