Learning the Pareto Front Using Bootstrapped Observation Samples
arxiv(2023)
摘要
We consider Pareto front identification (PFI) for linear bandits (PFILin),
i.e., the goal is to identify a set of arms with undominated mean reward
vectors when the mean reward vector is a linear function of the context. PFILin
includes the best arm identification problem and multi-objective active
learning as special cases. The sample complexity of our proposed algorithm is
optimal up to a logarithmic factor. In addition, the regret incurred by our
algorithm during the estimation is within a logarithmic factor of the optimal
regret among all algorithms that identify the Pareto front. Our key
contribution is a new estimator that in every round updates the estimate for
the unknown parameter along multiple context directions – in contrast to the
conventional estimator that only updates the parameter estimate along the
chosen context. This allows us to use low-regret arms to collect information
about Pareto optimal arms. Our key innovation is to reuse the exploration
samples multiple times; in contrast to conventional estimators that use each
sample only once. Numerical experiments demonstrate that the proposed algorithm
successfully identifies the Pareto front while controlling the regret.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要