Reinforcement Learning with Value Advice.

Mayank Daswani,Peter Sunehag,Marcus Hutter

Asian Conference on Machine Learning（2014）

引用 0|浏览37

暂无评分

摘要

The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the eectiveness of this method in the Arcade Learning Environment on three dierent games, using value estimates from UCT as advice.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要