Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes.

ICAPS'12: Proceedings of the Twenty-Second International Conference on International Conference on Automated Planning and Scheduling(2012)

引用 37|浏览35
暂无评分
摘要
Recent research leverages results from the continuous-armed bandit literature to create a reinforcement-learning algorithm for continuous state and action spaces. Initially proposed in a theoretical setting, we provide the first examination of the empirical properties of the algorithm. Through experimentation, we demonstrate the effectiveness of this planning method when coupled with exploration and model learning and show that, in addition to its formal guarantees, the approach is very competitive with other continuous-action reinforcement learners.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要