A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity


引用 0|浏览7
We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set 𝒳⊂ℝ^d, a fixed budget T, and an unpredictable sequence of parameters {θ_t}_t=1^T, an algorithm will aim to correctly identify the best arm x^* := max_x∈𝒳x^⊤∑_t=1^Tθ_t with probability as high as possible. Prior work has addressed the stationary setting where θ_t = θ_1 for all t and demonstrated that the error probability decreases as exp(-T /ρ^*) for a problem-dependent constant ρ^*. But in many real-world A/B/n multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over 𝒳 at each time then the error probability decreases as exp(-TΔ^2_(1)/d), where Δ_(1) = min_x ≠ x^* (x^* - x)^⊤1/T∑_t=1^T θ_t. As there exist environments where Δ_(1)^2/ d ≪ 1/ ρ^*, we are motivated to propose a novel algorithm 𝖯1-𝖱𝖠𝖦𝖤 that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of 𝖯1-𝖱𝖠𝖦𝖤 and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.
linear bandits,robustness,best-arm,non-stationarity
AI 理解论文
Chat Paper