Combinatorial Gaussian Process Bandits in Bayesian Settings: Theory and Application for Energy-Efficient Navigation
CoRR(2023)
摘要
We consider a combinatorial Gaussian process semi-bandit problem with
time-varying arm availability. Each round, an agent is provided a set of
available base arms and must select a subset of them to maximize the long-term
cumulative reward. Assuming the expected rewards are sampled from a Gaussian
process (GP) over the arm space, the agent can efficiently learn. We study the
Bayesian setting and provide novel Bayesian regret bounds for three GP-based
algorithms: GP-UCB, Bayes-GP-UCB and GP-TS. Our bounds extend previous results
for GP-UCB and GP-TS to a combinatorial setting with varying arm availability
and to the best of our knowledge, we provide the first Bayesian regret bound
for Bayes-GP-UCB. Time-varying arm availability encompasses other widely
considered bandit problems such as contextual bandits. We formulate the online
energy-efficient navigation problem as a combinatorial and contextual bandit
and provide a comprehensive experimental study on synthetic and real-world road
networks with detailed simulations. The contextual GP model obtains lower
regret and is less dependent on the informativeness of the prior compared to
the non-contextual Bayesian inference model. In addition, Thompson sampling
obtains lower regret than Bayes-UCB for both the contextual and non-contextual
model.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要