Efficient Contextual Bandits in Non-stationary Worlds
conference on learning theory, pp. 1739-1776, 2018.
Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with soph...More
PPT (Upload PPT)