AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We showed that multi-agent active perception modelled as a Dec-ρPOMDP can be reduced to a standard Dec-partially observable Markov decision process by introducing individual prediction actions

Multi-agent active perception with prediction rewards

NIPS 2020, (2020)

Cited by: 0|Views210
EI
Full Text
Bibtex
Weibo

Abstract

Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable. The task is decentralized and the joint estimate can only be computed after the task ends by fusing observations of all agents. The objective is to maximize the accuracy of the estimate. The...More
0
Introduction
  • Active perception, collecting observations to reduce uncertainty about a hidden variable, is one of the fundamental capabilities of an intelligent agent [2].
  • In multi-agent active perception a team of autonomous agents cooperatively gathers observations to infer the value of a hidden variable.
  • A multi-agent active perception task often has a finite duration: after observations have been gathered, they are collected to a central database for inference.
  • The key problem in multi-agent active perception is to determine how each agent should act during the decentralized phase to maximize the informativeness of the collected observations, evaluated afterwards during the centralized inference phase.
  • The agents should act so as to maximize the expected sum of shared rewards, accumulated at each time step over a finite horizon
Highlights
  • Active perception, collecting observations to reduce uncertainty about a hidden variable, is one of the fundamental capabilities of an intelligent agent [2]
  • The problem can be formalized as a decentralized partially observable Markov decision process (Dec-POMDP) [3, 15], a general model of sequential multi-agent decision-making under uncertainty
  • We show that the convex centralized prediction reward can be converted to a decentralized prediction reward that is a function of the hidden state and so-called individual prediction actions
  • We prove the empirical usefulness of our results by applying standard Dec-POMDP solution algorithms to active perception problems, demonstrating improved scalability over the state-of-the-art
  • We show that the expected decentralized prediction reward is at most equal to the centralized prediction reward, and that a similar relation holds between value functions in the respective Dec-POMDP and Dec-ρPOMDP
  • We showed that multi-agent active perception modelled as a Dec-ρPOMDP can be reduced to a standard Dec-POMDP by introducing individual prediction actions
Methods
  • The Dec-ρPOMDP the authors target is computationally more challenging (NEXP-complete [3]) than centralized POMDP, POMDP-IR, or ρPOMDP (PSPACE-complete [20]).
  • As the PLAN subroutine of APAS, the authors use the finite-horizon variant of the policy graph improvement method of [19].
Conclusion
  • The authors showed that multi-agent active perception modelled as a Dec-ρPOMDP can be reduced to a standard Dec-POMDP by introducing individual prediction actions.
  • The difference between the optimal solution of the standard Dec-POMDP and the Dec-ρPOMDP is bounded.
  • The authors' reduction enables application of any standard Dec-POMDP solver to multi-agent active perception problems, as demonstrated by the proposed APAS algorithm.
  • The authors' results allow transferring advances in scalability for standard Dec-POMDPs to multi-agent active perception tasks.
  • The authors' reduction result enables further investigation into learning for multi-agent active perception.
  • An investigation of the necessary conditions for when the loss due to decentralization is zero is another future direction
Tables
  • Table1: Average policy values ± standard error in the MAV (left) and the rovers domains (right)
  • Table2: Average total runtime of APAS ± standard deviation in seconds in the MAV domain (K = 2) and the Rovers domain (K = 5)
Download tables as Excel
Related work
  • We briefly review possible formulations of multi-agent active perception problems, and then focus on the Dec-POMDP model that provides the most general formulation.

    Multi-agent active perception has been formulated as a distributed constraint optimization problem (DCOP), submodular maximization, or as a specialized variant of a partially observable Markov decision process (POMDP). Probabilistic DCOPs with partial agent knowledge have been applied to signal source localization [9, 27]. DCOPs with Markovian dynamics have been proposed for target tracking by multiple sensors [13]. DCOPs are a simpler model than Dec-POMDPs, as a fixed communication structure is assumed or the noise in the sensing process is not modelled. Submodular maximization approaches assume the agents’ reward can be stated as a submodular set function, and apply distributed greedy maximization to obtain an approximate solution [23, 8, 7]. Along with the structure of the reward function, inter-agent communication is typically assumed. Specialized variants of POMDPs may also be applied. If all-to-all communication without delay during task execution is available, centralized control is possible and the problem can be solved as a multi-agent POMDP [25]. Auctioning of POMDP policies can facilitate multi-agent cooperation when agents can communicate [6]. Best et al [4] propose a decentralized Monte Carlo tree search planner where agents periodically communicate their open-loop plans to each other.
Funding
  • This project had received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 758824 —INFLUENCE). A Overview of supplementary appendix sections The supplementary appendix sections are structured as follows
Reference
  • Mauricio Araya-López, Olivier Buffet, Vincent Thomas, and Francois Charpillet. A POMDP Extension with Belief-dependent Rewards. In Advances in Neural Information Processing Systems, pages 64–72, 2010.
    Google ScholarLocate open access versionFindings
  • Ruzena Bajcsy, Yiannis Aloimonos, and John K Tsotsos. Revisiting active perception. Autonomous Robots, 42(2):177–196, 2018.
    Google ScholarLocate open access versionFindings
  • Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4):819–840, 2002.
    Google ScholarLocate open access versionFindings
  • Graeme Best, Oliver M Cliff, Timothy Patten, Ramgopal R Mettu, and Robert Fitch. Dec-MCTS: Decentralized planning for multi-robot active perception. The International Journal of Robotics Research, 38(2-3):316–337, 2019.
    Google ScholarLocate open access versionFindings
  • Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
    Google ScholarFindings
  • Jesus Capitan, Matthijs T.J. Spaan, Luis Merino, and Anibal Ollero. Decentralized multi-robot cooperation with auctioned POMDPs. The International Journal of Robotics Research, 32(6):650–671, 2013.
    Google ScholarLocate open access versionFindings
  • Micah Corah and Nathan Michael. Distributed matroid-constrained submodular maximization for multirobot exploration: Theory and practice. Autonomous Robots, 43(2):485–501, 2019.
    Google ScholarLocate open access versionFindings
  • Bahman Gharesifard and Stephen L Smith. Distributed submodular maximization with limited information. IEEE Transactions on Control of Network Systems, 5(4):1635–1645, 2017.
    Google ScholarLocate open access versionFindings
  • Manish Jain, Matthew Taylor, Milind Tambe, and Makoto Yokoo. DCOPs meet the real world: Exploring unknown reward matrices with applications to mobile sensor networks. In Intl. Joint Conference on Artificial Intelligence (IJCAI), pages 181–186, 2009.
    Google ScholarLocate open access versionFindings
  • Mikko Lauri, Eero Heinänen, and Simone Frintrop. Multi-robot active information gathering with periodic communication. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 851–856, 2017.
    Google ScholarLocate open access versionFindings
  • Mikko Lauri, Joni Pajarinen, and Jan Peters. Information Gathering in Decentralized POMDPs by Policy Graph Improvement. In Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pages 1143–1151, 2019.
    Google ScholarLocate open access versionFindings
  • Mikko Lauri, Joni Pajarinen, and Jan Peters. Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement. Autonomous Agents and MultiAgent Systems, 34(42):1–44, 2020. Issue no. 2.
    Google ScholarLocate open access versionFindings
  • Duc Thien Nguyen, William Yeoh, Hoong Chuin Lau, Shlomo Zilberstein, and Chongjie Zhang. Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs. In AAAI Conference on Artificial Intelligence, pages 1447–1455, 2014.
    Google ScholarLocate open access versionFindings
  • Frans A. Oliehoek. Sufficient plan-time statistics for decentralized POMDPs. In Intl. Joint Conference on Artificial Intelligence (IJCAI), pages 302–308, 2013.
    Google ScholarLocate open access versionFindings
  • Frans A. Oliehoek and Christopher Amato. A Concise Introduction to Decentralized POMDPs. Springer, 2016.
    Google ScholarFindings
  • Frans A. Oliehoek, Matthijs TJ Spaan, and Nikos Vlassis. Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research, 32:289–353, 2008.
    Google ScholarLocate open access versionFindings
  • Frans A. Oliehoek, Stefan J. Witwicki, and Leslie P. Kaelbling. Influence-based abstraction for multiagent systems. In AAAI Conference on Artificial Intelligence, pages 1422–1428, 2012.
    Google ScholarLocate open access versionFindings
  • Frans A. Oliehoek, Matthijs TJ Spaan, Bas Terwijn, Philipp Robbel, and João V Messias. The MADP toolbox: an open source library for planning and learning in (multi-) agent systems. The Journal of Machine Learning Research, 18(1):3112–3116, 2017.
    Google ScholarLocate open access versionFindings
  • Joni K Pajarinen and Jaakko Peltonen. Periodic Finite State Controllers for Efficient POMDP and DECPOMDP Planning. In Advances in Neural Information Processing Systems, pages 2636–2644. 2011.
    Google ScholarLocate open access versionFindings
  • Christos H Papadimitriou and John N Tsitsiklis. The complexity of markov decision processes. Mathematics of operations research, 12(3):441–450, 1987.
    Google ScholarLocate open access versionFindings
  • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, and Matthijs T. J. Spaan. Exploiting submodular value functions for scaling up active perception. Autonomous Robots, 42(2):209–233, 2018.
    Google ScholarLocate open access versionFindings
  • Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans A. Oliehoek, and Martha White. Maximizing information gain in partially observable environments via prediction rewards. In Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pages 1215–1223, 2020.
    Google ScholarLocate open access versionFindings
  • Amarjeet Singh, Andreas Krause, Carlos Guestrin, and William J Kaiser. Efficient informative sensing using multiple robots. Journal of Artificial Intelligence Research, 34:707–755, 2009.
    Google ScholarLocate open access versionFindings
  • Noah A Smith and Roy W Tromble. Sampling uniformly from the unit simplex. Johns Hopkins University, Techical Report, 2004.
    Google ScholarFindings
  • Matthijs TJ Spaan and Pedro U Lima. A decision-theoretic approach to dynamic sensor selection in camera networks. In Intl. Conf. on Automated Planning and Scheduling (ICAPS), pages 297–304, 2009.
    Google ScholarLocate open access versionFindings
  • Matthijs TJ Spaan, Tiago S Veiga, and Pedro U Lima. Decision-theoretic planning under uncertainty with information rewards for active cooperative perception. Autonomous Agents and Multi-Agent Systems, 29 (6):1157–1185, 2015.
    Google ScholarFindings
  • Matthew E Taylor, Manish Jain, Yanquin Jin, Makoto Yokoo, and Milind Tambe. When should there be a “me” in a “team”?: Distributed multi-agent optimization under uncertainty. In Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pages 109–116, 2010.
    Google ScholarLocate open access versionFindings
Author
Mikko Lauri
Mikko Lauri
Frans Oliehoek
Frans Oliehoek
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科