Minimax Rates in Contextual Partial Monitoring

neural information processing systems(2018)

引用 23|浏览67
暂无评分
摘要
We generalize the finite partial monitoring problem to the contextual setting. Partial monitoring allows learning even when the loss of the chosen action is not observed. In the non-contextual problem, the minimax regret is known to be O(T^{2/3}) if a global observability condition is satisfied and improves to O(sqrt{T}) under a stronger local observability condition. Perhaps surprisingly, we show that the same characterization does not hold in the contextual case and a stronger notion of pairwise observability is necessary for O(sqrt{T}) minimax regret. In particular, we provide a lower bound of O(T^{2/3}) for any non-pairwise observable game, which applies to locally observable games. We also propose two algorithms in the adversarial setting. The first requires a finite policy class but allows for arbitrary contexts and can be tuned to obtain the optimal O(sqrt{T}) rate in pairwise observable settings or the optimal O(T^{2/3}) rate otherwise. The second allows for arbitrary policy classes with an empirical risk minimization oracle but requires i.i.d. contexts; we also show an optimal O(T^{2/3}) upper bound and an efficient implementation using only a constant number of oracle calls per round.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要