Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
JMLR Workshop and Conference Proceedings, pp. 535-543, 2015.
A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in st...More
PPT (Upload PPT)