Efficient Multivariate Bandit Algorithm with Path Planning

Nie Keyu,Zhang Zezhong, Yuan Ted Tao, Song Rong, Burke Pauline Berry

2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)(2020)

引用 0|浏览13
暂无评分
摘要
We solve the arms exponential exploding issues in the Multivariate-MAB when the arm dimension hierarchy is considered. We propose a framework called path planning, which utilizes paths in a graph to model reward success rate with m-way dimension interaction and adopts Thompson Sampling (TS) for a heuristic search. It is straightforward to combat the curse of dimensionality using a serial process that operates sequentially by focusing on one dimension per each process. Our proposed method utilizing tree models has advantages comparing with traditional models such as general linear regression. Real data and simulation studies validate our claim by achieving faster convergence speed, better efficient optimal arm allocation, and lower cumulative regret.
更多
查看译文
关键词
Multi-Armed Bandit,Monte Carlo Tree Search,Combinatorial Optimization,Heuristic Algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要