Towards Generalization and Efficiency in Reinforcement Learning

user-5eddf84c4c775e09d87c9229（2019）

引用 1|浏览22

暂无评分

摘要

Different from classic Supervised Learning, Reinforcement Learning (RL), is fundamentally interactive: an autonomous agent must learn how to behave in an unknown, uncertain, and possibly hostile environment, by actively interacting with the environment to collect useful feedback to improve its sequential decision making ability. The RL agent will also intervene in the environment: the agent makes decisions which in turn affects further evolution of the environment.Because of its generality–most machine learning problems can be viewed as special cases–RL is hard. As there is no direct supervision, one central challenge in RL is how to explore an unknown environment and collect useful feedback efficiently. In recent RL success stories (eg, super-human performance on video games [Mnih et al., 2015]), we notice that most of them rely on random exploration strategies, such as ϵ-greedy. Similarly, policy gradient method such as REINFORCE [Williams, 1992], perform exploration by injecting randomness into action space and hope the randomness can lead to a good sequence of actions that achieves high total reward. The theoretical RL literature has developed more sophisticated algorithms for efficient exploration (eg,[Azar et al., 2017]), however, the sample complexity of these near-optimal algorithms has to scale exponentially with respect to key parameters of underlying systems such as dimensions of state and action space. Such exponential dependence prohibits a direct application of these theoretically elegant RL algorithms to large-scale applications. In summary, without any further assumptions, RL is hard, both in practice and in theory.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要