Reinforcement Learning: Theory and Algorithms

user-5ebe3c75d0b15254d6c50b36(2019)

引用 2|浏览46
暂无评分
摘要
• It is helpful to overload notation and let P also refer to a matrix of size (S· A)× S where the entry P (s, a), s is equal to P (s| s, a). We also will define Pπ to be the transition matrix on state-action pairs induced by a deterministic policy π. In particular, Pπ (s, a),(s, a)= P (s| s, a) if a= π (s) and Pπ(s, a),(s, a)= 0 if a= π (s). With this notation,
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要