Combinatorial Pure Exploration with Partial or Full-Bandit Linear Feedback

arxiv(2020)

引用 0|浏览6
暂无评分
摘要
In this paper, we propose the novel model of combinatorial pure exploration with partial linear feedback (CPE-PL). In CPE-PL, given a combinatorial action space $\mathcal{X} \subseteq \{0,1\}^d$, in each round a learner chooses one action $x \in \mathcal{X}$ to play, obtains a random (possibly nonlinear) reward related to $x$ and an unknown latent vector $\theta \in \mathbb{R}^d$, and observes a partial linear feedback $M_{x} (\theta + \eta)$, where $\eta$ is a zero-mean noise vector and $M_x$ is a transformation matrix for $x$. The objective is to identify the optimal action with the maximum expected reward using as few rounds as possible. We also study the important subproblem of CPE-PL, i.e., combinatorial pure exploration with full-bandit feedback (CPE-BL), in which the learner observes full-bandit feedback (i.e. $M_x = x^{\top}$) and gains linear expected reward $x^{\top} \theta$ after each play. In this paper, we first propose a polynomial-time algorithmic framework for the general CPE-PL problem with novel sample complexity analysis. Then, we propose an adaptive algorithm dedicated to the subproblem CPE-BL with better sample complexity. Our work provides a novel polynomial-time solution to simultaneously address limited feedback, general reward function and combinatorial action space including matroids, matchings, and $s$-$t$ paths.
更多
查看译文
关键词
combinatorial pure exploration,partial,feedback,full-bandit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要