Second Order Methods for Bandit Optimization and Control
CoRR(2024)
摘要
Bandit convex optimization (BCO) is a general framework for online decision
making under uncertainty. While tight regret bounds for general convex losses
have been established, existing algorithms achieving these bounds have
prohibitive computational costs for high dimensional data.
In this paper, we propose a simple and practical BCO algorithm inspired by
the online Newton step algorithm. We show that our algorithm achieves optimal
(in terms of horizon) regret bounds for a large class of convex functions that
we call κ-convex. This class contains a wide range of practically
relevant loss functions including linear, quadratic, and generalized linear
models. In addition to optimal regret, this method is the most efficient known
algorithm for several well-studied applications including bandit logistic
regression.
Furthermore, we investigate the adaptation of our second-order bandit
algorithm to online convex optimization with memory. We show that for loss
functions with a certain affine structure, the extended algorithm attains
optimal regret. This leads to an algorithm with optimal regret for bandit
LQR/LQG problems under a fully adversarial noise model, thereby resolving an
open question posed in and .
Finally, we show that the more general problem of BCO with (non-affine)
memory is harder. We derive a Ω̃(T^2/3) regret lower bound,
even under the assumption of smooth and quadratic losses.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要