Dynamic Ensemble of Contextual Bandits to Satisfy Users' Changing Interests
WWW '19: The Web Conference on The World Wide Web Conference WWW 2019(2019)
摘要
Recommender systems have to handle a highly non-stationary environment, due to users' fast changing interests over time. Traditional solutions have to periodically rebuild their models, despite high computational cost. But this still cannot empower them to automatically adjust to abrupt changes in trends caused by timely information. It is important to note that the changes of reward distributions caused by a non-stationary environment can also be context dependent. When the change is orthogonal to the given context, previously maintained models should be reused for better recommendation prediction.
In this work, we focus on contextual bandit algorithms for making adaptive recommendations. We capitalize on the unique context-dependent property of reward changes to conquer the challenging non-stationary environment for model update. In particular, we maintain a dynamic ensemble of contextual bandit models, where each bandit model's reward estimation quality is monitored regarding given context and possible environment changes. Only the admissible models to the current environment will be used for recommendation. We provide a rigorous upper regret bound analysis of our proposed algorithm. Extensive empirical evaluations on both synthetic and three real-world datasets confirmed the algorithm's advantage against existing non-stationary solutions that simply create new models whenever an environment change is detected.
更多查看译文
关键词
Non-stationary bandits, recommender systems, regret analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络