A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

Reda Alami, Mohammed Mahfoud,Mastane Achab

2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023(2023)

引用 0|浏览3
暂无评分
摘要
In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon T. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environmentspecific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive reward maximization approach often does not accurately capture the complexity of the learning problem and results in unreliable solutions. To tackle problems of this nature, we propose a framework of adaptive riskaware strategies that operate in non-stationary environments. Our framework incorporates various risk measures prevalent in the literature to map multiple families of multi-armed bandit algorithms into a risk-sensitive setting. In addition, we equip the resulting algorithms with the Restarted Bayesian Online ChangePoint Detection (R-BOCPD) algorithm and impose a (tunable) forced exploration strategy to detect local (per-arm) switches. We provide finite-time theoretical guarantees and an asymptotic regret bound of (O) over tilde (root KTT) up to time horizon T with K-T the total number of change-points. In practice, our framework compares favorably to the state-of-the-art in both synthetic and real-world environments and manages to perform efficiently with respect to both risk-sensitivity and non-stationarity.
更多
查看译文
关键词
Non-stationary environments,risk averse bandits,change point detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要