Steady State Analysis of Episodic Reinforcement Learning

NIPS 2020（2020）

引用 16|浏览209

暂无评分

摘要

This paper proves that the episodic learning environment of every finite-horizon decision task has a unique steady state under any behavior policy, and that the marginal distribution of the agent's input indeed approaches to the steady-state distribution in essentially all episodic learning processes. This observation supports an interestingly reversed mindset against conventional wisdom: While steady states are usually presumed to exist in continual learning and are considered less relevant in episodic learning, it turns out they are guaranteed to exist for the latter. Based on this insight, the paper further develops connections between episodic and continual RL for several important concepts that have been separately treated in the two RL formalisms. Practically, the existence of unique and approachable steady state enables a general, reliable, and efficient way to collect data in episodic RL tasks, which the paper applies to policy gradient algorithms as a demonstration, based on a new steady-state policy gradient theorem. The paper also proposes and empirically evaluates a perturbation method that facilitates rapid mixing in real-world tasks.

查看译文

关键词

episodic reinforcement learning,reinforcement learning,steady state analysis,state analysis

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要