Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

International Conference on Machine Learning, Volume abs/1901.00301, 2019, Pages 7335-7344.

Cited by: 10|Views92
EI

Abstract:

We investigate the feasibility of learning from both fully-labeled supervised data and contextual bandit data. We specifically consider settings in which the underlying learning signal may be different between these two data sources. Theoretically, we state and prove no-regret algorithms for learning that is robust to divergences between ...More

Code:

Data:

Your rating :
0

 

Tags
Comments