AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Our experiments in multiple benchmark environments have shown the effectiveness of Robust-MADDPG in addressing the uncertainty in multi-agent reinforcement learning, outperforming several MARL methods with no robustness concerns

Robust Multi-Agent Reinforcement Learning with Model Uncertainty

NIPS 2020, (2020)

Cited by: 0|Views204
EI
Full Text
Bibtex
Weibo

Abstract

In this work, we study the problem of multi-agent reinforcement learning (MARL) with model uncertainty, which is referred to as robust MARL. This is naturally motivated by some multi-agent applications where each agent may not have perfectly accurate knowledge of the model, e.g., all the reward functions of other agents. Little a priori w...More

Code:

Data:

0
Introduction
  • Deep reinforcement learning (RL) has recently achieved tremendous successes in many sequential decision-making problems, varying from robotics [1, 2] and autonomous driving [3] to game playing [4, 5].
  • The solution obtained from the simulation without uncertainty may have poor performance in practice, known as the sim-to-real gap
  • Such an issue has been reported quite common in the autonomous-car racing application [17], which initially motivates the present work.
  • In single-agent RL, such an uncertainty has been nicely handled through the lens of robust Markov decision processes (MDPs) [18, 19, 20] and robust RL [21, 22]
  • In comparison, such an uncertainty has not been fully explored in the multi-agent RL regime.
Highlights
  • Deep reinforcement learning (RL) has recently achieved tremendous successes in many sequential decision-making problems, varying from robotics [1, 2] and autonomous driving [3] to game playing [4, 5]
  • To adapt to the worst-case scenario due to uncertainty, one can view the uncertainty as the decision made by an implicit player, a “nature” player, who always plays against each agent
  • In order to test the robustness of the proposed algorithm, which is referred to as Robust-MADDPG, or R-MADDPG for brevity, we impose different levels of uncertainty to the rewards returned from each particle environment
  • We report statistics that are averaged across 5 runs for cooperative navigation, and 25 runs for other scenarios where each agent or adversary is trained five times
  • By viewing the uncertainty as the decision made by an implicit player, we introduce the nature agent to model the uncertainty, who always plays against each agent by selecting the worst-case data at every state
  • Our experiments in multiple benchmark environments have shown the effectiveness of Robust-MADDPG in addressing the uncertainty in multi-agent reinforcement learning (MARL), outperforming several MARL methods with no robustness concerns
Results
  • To demonstrate the effectiveness of the proposed algorithm, the authors provide experimental results in several benchmark competitive and cooperative MARL environments, based on the multi-agent particle environments developed in [13].
  • In order to test the robustness of the proposed algorithm, which is referred to as Robust-MADDPG, or R-MADDPG for brevity, the authors impose different levels of uncertainty to the rewards returned from each particle environment.
  • The authors evaluate the quality of learned policies in a combination fashion, where each agent and adversary can be selected as the trained models from any of the aforementioned algorithms.
  • The authors demonstrate how these combinations lead to performance discrepancy in the environments with different levels of reward uncertainties.
  • The authors report statistics that are averaged across 5 runs for cooperative navigation, and 25 runs for other scenarios where each agent or adversary is trained five times
Conclusion
  • The authors have advocated the use of robust Markov games to capture the model uncertainty in MARL problems, motivated by the sim-to-real gap in the autonomous-car racing application [17].
  • The authors have proposed a multi-agent actor-critic method, i.e., Robust-MADDPG, to incorporate function approximation and handle large state-action spaces.
  • The authors plan to apply the method to other MARL scenarios with model uncertainty, and evaluate its sim-to-real performance in practical robotics applications, e.g., the multi-car racing platform [17]
Summary
  • Introduction:

    Deep reinforcement learning (RL) has recently achieved tremendous successes in many sequential decision-making problems, varying from robotics [1, 2] and autonomous driving [3] to game playing [4, 5].
  • The solution obtained from the simulation without uncertainty may have poor performance in practice, known as the sim-to-real gap
  • Such an issue has been reported quite common in the autonomous-car racing application [17], which initially motivates the present work.
  • In single-agent RL, such an uncertainty has been nicely handled through the lens of robust Markov decision processes (MDPs) [18, 19, 20] and robust RL [21, 22]
  • In comparison, such an uncertainty has not been fully explored in the multi-agent RL regime.
  • Objectives:

    The authors aim to develop such a robust MARL framework when model uncertainty is present.
  • Results:

    To demonstrate the effectiveness of the proposed algorithm, the authors provide experimental results in several benchmark competitive and cooperative MARL environments, based on the multi-agent particle environments developed in [13].
  • In order to test the robustness of the proposed algorithm, which is referred to as Robust-MADDPG, or R-MADDPG for brevity, the authors impose different levels of uncertainty to the rewards returned from each particle environment.
  • The authors evaluate the quality of learned policies in a combination fashion, where each agent and adversary can be selected as the trained models from any of the aforementioned algorithms.
  • The authors demonstrate how these combinations lead to performance discrepancy in the environments with different levels of reward uncertainties.
  • The authors report statistics that are averaged across 5 runs for cooperative navigation, and 25 runs for other scenarios where each agent or adversary is trained five times
  • Conclusion:

    The authors have advocated the use of robust Markov games to capture the model uncertainty in MARL problems, motivated by the sim-to-real gap in the autonomous-car racing application [17].
  • The authors have proposed a multi-agent actor-critic method, i.e., Robust-MADDPG, to incorporate function approximation and handle large state-action spaces.
  • The authors plan to apply the method to other MARL scenarios with model uncertainty, and evaluate its sim-to-real performance in practical robotics applications, e.g., the multi-car racing platform [17]
Tables
  • Table1: Keep-away: average steps for occupying the target per episode. We report the mean and 95% confidence interval from 25 model comparisons. Each model comparison evaluates 1000 episodes
  • Table2: Physical deception: success rates of agents and adversary, and minimum distance of agents from the non-target landmark. The results are averaged across 25 runs
  • Table3: Predator-prey: total number of prey touches by predators per episode. For prey, the smaller the better. For predators, the larger the better. The results are averaged across 25 runs
Download tables as Excel
Related work
  • Our work falls into the regime of MARL that originates from the seminal work [16], under the framework of Markov games [24]. Going beyond the zero-sum setting in [16], [25, 26, 27] have considered general-sum Markov games. Most of the later MARL works, either empirical or theoretical, have been built upon this Markov game model, e.g., [14, 13, 28, 29, 30, 31]. Despite the numerous advances in MARL recently, however, few of them based on Markov games have handled the uncertainty in the model, which is the focus of our work. The closest setting to ours is the recent work [32], which also considered robustness in MARL problems. Nonetheless, we highlight that the robustness there is with respect to the changes of the opponents’ policies, between the training and testing phases, instead of the robustness to the model uncertainty that we consider here.
Funding
  • Acknowledgments and Disclosure of Funding The research of K.Z. and T.B. was supported in part by the US Army Research Laboratory (ARL) Cooperative Agreement W911NF-17-2-0196, and in part by the Office of Naval Research (ONR) MURI Grant N00014-16-1-2710
Reference
  • Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
    Google ScholarLocate open access versionFindings
  • OpenAI, Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafał Józefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, and Wojciech Zaremba. Learning dexterous in-hand manipulation. CoRR, 2018.
    Google ScholarLocate open access versionFindings
  • Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70–76, 2017.
    Google ScholarLocate open access versionFindings
  • David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.
    Google ScholarLocate open access versionFindings
  • OpenAI, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Debiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, and Susan Zhang. Dota 2 with large scale deep reinforcement learning. 2019.
    Google ScholarFindings
  • Shai Shalev-Shwartz, Shaked Shammah, and Amnon Shashua. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
    Findings
  • Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
    Google ScholarLocate open access versionFindings
  • Lucian Busoniu, Robert Babuska, and Bart De Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008, 2008.
    Google ScholarLocate open access versionFindings
  • Kaiqing Zhang, Zhuoran Yang, and Tamer Basar. Multi-agent reinforcement learning: A selective overview of theories and algorithms. arXiv preprint arXiv:1911.10635, 2019.
    Findings
  • Jakob Foerster, Yannis M Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pages 2137–2145, 2016.
    Google ScholarLocate open access versionFindings
  • Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multi-agent Systems, pages 66–83, 2017.
    Google ScholarLocate open access versionFindings
  • Gürdal Arslan and Serdar Yüksel. Decentralized Q-learning for stochastic teams and games. IEEE Transactions on Automatic Control, 62(4):1545–1558, 2017.
    Google ScholarLocate open access versionFindings
  • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multiagent actor-critic for mixed cooperative-competitive environments. In Advances in neural information processing systems, pages 6379–6390, 2017.
    Google ScholarLocate open access versionFindings
  • Jakob N Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In Thirty-second AAAI conference on artificial intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, and Tamer Basar. Fully decentralized multi-agent reinforcement learning with networked agents. In International Conference on Machine Learning, pages 5867–5876, 2018.
    Google ScholarLocate open access versionFindings
  • Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In International Conference on Machine Learning, pages 157–163, 1994.
    Google ScholarLocate open access versionFindings
  • B. Balaji, S. Mallya, S. Genc, S. Gupta, L. Dirac, V. Khare, G. Roy, T. Sun, Y. Tao, B. Townsend, E. Calleja, S. Muralidhara, and D. Karuppasamy. Deepracer: Autonomous racing platform for experimentation with sim2real reinforcement learning. In IEEE International Conference on Robotics and Automation, pages 2746–2754, 2020.
    Google ScholarLocate open access versionFindings
  • Garud N Iyengar. Robust dynamic programming. Mathematics of Operations Research, 30(2):257–280, 2005.
    Google ScholarLocate open access versionFindings
  • Arnab Nilim and Laurent El Ghaoui. Robust control of markov decision processes with uncertain transition matrices. Operations Research, 53(5):780–798, 2005.
    Google ScholarLocate open access versionFindings
  • Wolfram Wiesemann, Daniel Kuhn, and Berç Rustem. Robust markov decision processes. Mathematics of Operations Research, 38(1):153–183, 2013.
    Google ScholarLocate open access versionFindings
  • Shiau Hong Lim, Huan Xu, and Shie Mannor. Reinforcement learning in robust markov decision processes. In Advances in Neural Information Processing Systems, pages 701–709, 2013.
    Google ScholarLocate open access versionFindings
  • Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. Robust adversarial reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2817–2826. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Erim Kardes, Fernando Ordóñez, and Randolph W Hall. Discounted robust stochastic games and an application to queueing control. Operations Research, 59(2):365–382, 2011.
    Google ScholarLocate open access versionFindings
  • Lloyd S Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095–1100, 1953.
    Google ScholarLocate open access versionFindings
  • Junling Hu and Michael P Wellman. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4(Nov):1039–1069, 2003.
    Google ScholarLocate open access versionFindings
  • Michael L Littman. Friend-or-foe Q-learning in general-sum games. In International Conference on Machine Learning, volume 1, pages 322–328, 2001.
    Google ScholarLocate open access versionFindings
  • Amy Greenwald, Keith Hall, and Roberto Serrano. Correlated Q-learning. In International Conference on Machine Learning, volume 20, page 242, 2003.
    Google ScholarLocate open access versionFindings
  • Thomas Dueholm Hansen, Peter Bro Miltersen, and Uri Zwick. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. Journal of the ACM, 60(1):1–16, 2013.
    Google ScholarLocate open access versionFindings
  • Aaron Sidford, Mengdi Wang, Lin Yang, and Yinyu Ye. Solving discounted stochastic twoplayer games with near-optimal time and sample complexity. In International Conference on Artificial Intelligence and Statistics, pages 2992–3002, 2020.
    Google ScholarLocate open access versionFindings
  • Kaiqing Zhang, Sham M Kakade, Tamer Basar, and Lin F Yang. Model-based multiagent RL in zero-sum Markov games with near-optimal sample complexity. arXiv preprint arXiv:2007.07461, 2020.
    Findings
  • Kaiqing Zhang, Zhuoran Yang, and Tamer Basar. Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games. In Advances in Neural Information Processing Systems, pages 11602–11614, 2019.
    Google ScholarLocate open access versionFindings
  • Shihui Li, Yi Wu, Xinyue Cui, Honghua Dong, Fei Fang, and Stuart Russell. Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4213–4220, 2019.
    Google ScholarLocate open access versionFindings
  • Daniel J Mankowitz, Timothy A Mann, Pierre-Luc Bacon, Doina Precup, and Shie Mannor. Learning robust options. In AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Esther Derman, Daniel J Mankowitz, Timothy A Mann, and Shie Mannor. Soft-robust actorcritic policy-gradient. arXiv preprint arXiv:1803.04848, 2018.
    Findings
  • Daniel J Mankowitz, Nir Levine, Rae Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Timothy Mann, Todd Hester, and Martin Riedmiller. Robust reinforcement learning for continuous control with model misspecification. arXiv preprint arXiv:1906.07516, 2019.
    Findings
  • Jun Morimoto and Kenji Doya. Robust reinforcement learning. Neural computation, 17(2):335– 359, 2005.
    Google ScholarLocate open access versionFindings
  • Chen Tessler, Yonathan Efroni, and Shie Mannor. Action robust reinforcement learning and applications in continuous control. arXiv preprint arXiv:1901.09184, 2019.
    Findings
  • Mohammed Amin Abdullah, Hang Ren, Haitham Bou Ammar, Vladimir Milenkovic, Rui Luo, Mingtian Zhang, and Jun Wang. Wasserstein robust reinforcement learning. arXiv preprint arXiv:1907.13196, 2019.
    Findings
  • Huan Xu and Shie Mannor. Distributionally robust Markov decision processes. In Advances in Neural Information Processing Systems, pages 2505–2513, 2010.
    Google ScholarLocate open access versionFindings
  • Elena Smirnova, Elvis Dohmatob, and Jérémie Mary. Distributionally robust reinforcement learning. arXiv preprint arXiv:1902.08708, 2019.
    Findings
  • Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.
    Google ScholarLocate open access versionFindings
  • Michael L Littman and Csaba Szepesvári. A generalized reinforcement-learning model: Convergence and applications. In International Conference on Machine Learning, volume 96, pages 310–318, 1996.
    Google ScholarLocate open access versionFindings
  • Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, pages 1057–1063, 2000.
    Google ScholarLocate open access versionFindings
  • David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In International Conference on Machine Learning, pages 387–395, 2014.
    Google ScholarLocate open access versionFindings
  • Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in Neural Information Processing Systems, pages 1008–1014, 2000.
    Google ScholarLocate open access versionFindings
  • Vijay R Konda and John N Tsitsiklis. On actor-critic algorithms. SIAM Journal on Control and Optimization, 42(4):1143–1166, 2003.
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pages 1928–1937, 2016.
    Google ScholarLocate open access versionFindings
  • Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1861–1870, 2018.
    Google ScholarLocate open access versionFindings
  • Csaba Szepesvári and Michael L Littman. A unified analysis of value-function-based reinforcement-learning algorithms. Neural Computation, 11(8):2017–2060, 1999.
    Google ScholarLocate open access versionFindings
Author
Kaiqing Zhang
Kaiqing Zhang
TAO SUN
TAO SUN
Yunzhe Tao
Yunzhe Tao
Sahika Genc
Sahika Genc
Sunil Mallya
Sunil Mallya
Your rating :
0

 

Tags
Comments
小科