Online learning with expert advice and finite-horizon constraints

Branislav Kveton,Jia Yuan Yu,Georgios Theocharous,Shie Mannor

AAAI（2008）

引用 28|浏览13

暂无评分

摘要

In this paper, we study a sequential decision making problem. The objective is to maximize the average reward accumulated over time subject to temporal cost constraints. The novelty of our setup is that the rewards and constraints are controlled by an adverse opponent. To solve our problem in a practical way, we propose an expert algorithm that guarantees both a vanishing regret and a sublinear number of violated constraints. The quality of this solution is demonstrated on a real-world power management problem. Our results support the hypothesis that online learning with convex cost constraints can be performed successfully in practice.

查看译文

关键词

temporal cost constraint,average reward,adverse opponent,real-world power management problem,sequential decision,sublinear number,finite-horizon constraint,expert algorithm,expert advice,time subject,convex cost constraint

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要