Transfer Learning in Multi-Armed Bandit: A Causal Approach.
AAMAS(2017)
摘要
We leverage causal inference tools to support a principled and more robust transfer of knowledge in reinforcement learning (RL) settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by Pearl's {do-calculus} nor standard off-policy learning techniques. Our new identification strategy combines two steps -- first, deriving bounds over the arm's distribution based on structural knowledge; second, incorporating these bounds in a novel bandit algorithm, B-kl-UCB. Simulations demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods.
更多查看译文
关键词
Causal Inference,Transfer Learning,Reinforcement Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络