Optimal rewards in multiagent teams

ICDL-EPIROB(2012)

引用 8|浏览33
暂无评分
摘要
Following work on designing optimal rewards for single agents, we define a multiagent optimal rewards problem (ORP) in common-payoff (or team) settings. This new problem solves for individual agent reward functions that guide agents to better overall team performance relative to teams in which all agents guide their behavior with the same given team-reward function. We present a multiagent architecture in which each agent learns good reward functions from experience using a gradient-based algorithm in addition to performing the usual task of planning good policies (except in this case with respect to the learned rather than the given reward function). Multiagency introduces the challenge of nonstationarity: because the agents learn simultaneously, each agent's learning problem is nonstationary and interdependent on the other agents. We demonstrate on two simple domains that the proposed architecture outperforms the conventional approach in which all the agents use the same given team-reward function (even when accounting for the resource overhead of the reward learning); that the learning algorithm performs stably despite the nonstationarity; and that learning individual reward functions can lead to better specialization of roles than is possible with shared reward, whether learned or given.
更多
查看译文
关键词
optimisation,nonstationarity,team-reward function,planning (artificial intelligence),learning (artificial intelligence),individual agent reward function,resource overhead,reward learning,multiagent optimal rewards problem,team performance,agent learning problem,multi-agent systems,role specialization,multiagent team,gradient-based algorithm,shared reward,optimization approach,multiagent architecture,decentralized planning approach,agent behavior,learning algorithm,multiagent orp,common-payoff setting,multi agent systems,learning artificial intelligence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要