谷歌浏览器插件
订阅小程序
在清言上使用

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit.

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13(2024)

引用 0|浏览38
暂无评分
摘要
We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given d stochastic arms, and the reward of each arm s∈{1, …, d} follows an unknown distribution with mean μ_s. In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal action π^* = _π∈𝒜μ^⊤π from a finite-sized real-valued action set 𝒜⊂ℝ^d with as few arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size of the action set 𝒜 is polynomial in d. We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in d. We also introduce a novel problem-dependent sample complexity lower bound of the R-CPE-MAB problem, and show that the GenTS-Explore algorithm achieves the optimal sample complexity up to a problem-dependent constant factor.
更多
查看译文
关键词
Thompson Sampling,Adversarial Multi-Armed Bandits,Bandit Optimization,Contextual Bandits,Gathering Algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要