Exploiting the Natural Exploration In Contextual Bandits
arXiv: Machine Learning, Volume abs/1704.09011, 2017.
The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation trade-off. In particular, greedy policies that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy policies are desirable in many practical settings where explorat...More
Full Text (Upload PDF)
PPT (Upload PPT)