Combinatorial Pure Exploration with Partial or Full-Bandit Linear Feedback

arxiv（2020）

引用 0|浏览6

暂无评分

摘要

In this paper, we propose the novel model of combinatorial pure exploration with partial linear feedback (CPE-PL). In CPE-PL, given a combinatorial action space $\mathcal{X} \subseteq \{0,1\}^d$, in each round a learner chooses one action $x \in \mathcal{X}$ to play, obtains a random (possibly nonlinear) reward related to $x$ and an unknown latent vector $\theta \in \mathbb{R}^d$, and observes a partial linear feedback $M_{x} (\theta + \eta)$, where $\eta$ is a zero-mean noise vector and $M_x$ is a transformation matrix for $x$. The objective is to identify the optimal action with the maximum expected reward using as few rounds as possible. We also study the important subproblem of CPE-PL, i.e., combinatorial pure exploration with full-bandit feedback (CPE-BL), in which the learner observes full-bandit feedback (i.e. $M_x = x^{\top}$) and gains linear expected reward $x^{\top} \theta$ after each play. In this paper, we first propose a polynomial-time algorithmic framework for the general CPE-PL problem with novel sample complexity analysis. Then, we propose an adaptive algorithm dedicated to the subproblem CPE-BL with better sample complexity. Our work provides a novel polynomial-time solution to simultaneously address limited feedback, general reward function and combinatorial action space including matroids, matchings, and $s$-$t$ paths.

查看译文

关键词

combinatorial pure exploration,partial,feedback,full-bandit

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要