Reinforcement Learning: Theory and Algorithms

user-5ebe3c75d0b15254d6c50b36（2019）

引用 2|浏览46

暂无评分

摘要

• It is helpful to overload notation and let P also refer to a matrix of size (S· A)× S where the entry P (s, a), s is equal to P (s| s, a). We also will define Pπ to be the transition matrix on state-action pairs induced by a deterministic policy π. In particular, Pπ (s, a),(s, a)= P (s| s, a) if a= π (s) and Pπ(s, a),(s, a)= 0 if a= π (s). With this notation,

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要