Towards a Fast Detection of Opponents in Repeated Stochastic Games.

AAMAS Workshops (Selected Papers)(2017)

引用 28|浏览12
暂无评分
摘要
Multi-agent algorithms aim to find the best response in strategic interactions. While many state-of-the-art algorithms assume repeated interaction with a fixed set of opponents (or even self-play), a learner in the real world is more likely to encounter the same strategic situation with changing counter-parties. This article presents a formal model of such sequential interactions, and a corresponding algorithm that combines the two established frameworks Pepper and Bayesian policy reuse. For each interaction, the algorithm faces a repeated stochastic game with an unknown (small) number of repetitions against a random opponent from a population, without observing the opponent’s identity. Our algorithm is composed of two main steps: first it draws inspiration from multiagent algorithms to obtain acting policies in stochastic games, and second it computes a belief over the possible opponents that is updated as the interaction occurs. This allows the agent to quickly select the appropriate policy against the opponent. Our results show fast detection of the opponent from its behavior, obtaining higher average rewards than the state-of-the-art baseline Pepper in repeated stochastic games.
更多
查看译文
关键词
Stochastic games,Reinforcement learning,Multi-agent learning,Policy reuse
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要