An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem
Operations Research Letters, Volume 43, Issue 6, 2015, Pages 558-563.
EI
Abstract:
We improved an upper bound on the expected regret of a UCB-type policy LLR for a bandit problem that repeats the following rounds: a player selects a maximal matching on a complete bipartite graph K M , N and receives a reward for each component edge of the selected matching. Rewards are assumed to be generated independently of its previo...More
Code:
Data:
Get fulltext within 24h
Upload PDF
1.Your uploaded documents will be check within 24h, and coins will be credited to your account.
2.As the current system does not support cash withdrawal, you can add staff WeChat (AMxiaomai) to receive it as a red packet.
3.10 coins will be exchanged for 1 yuan.
?
¥
Upload a single paper
for 5 coins
Wechat's Red Packet
?
¥
Upload 50 articles
for 280 coins
Wechat's Red Packet
?
¥
Upload 200 articles
for 1200 coins
Wechat's Red Packet
?
¥
Upload 500 articles
for 3000 coins
Wechat's Red Packet
?
¥
Upload 1000 articles
for 7000 coins
Wechat's Red Packet
Tags
Comments