AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We showed that multi-agent active perception modelled as a Dec-ρPOMDP can be reduced to a standard Dec-partially observable Markov decision process by introducing individual prediction actions
Multi-agent active perception with prediction rewards
NIPS 2020, (2020)
Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable. The task is decentralized and the joint estimate can only be computed after the task ends by fusing observations of all agents. The objective is to maximize the accuracy of the estimate. The...More
PPT (Upload PPT)
- Active perception, collecting observations to reduce uncertainty about a hidden variable, is one of the fundamental capabilities of an intelligent agent .
- In multi-agent active perception a team of autonomous agents cooperatively gathers observations to infer the value of a hidden variable.
- A multi-agent active perception task often has a finite duration: after observations have been gathered, they are collected to a central database for inference.
- The key problem in multi-agent active perception is to determine how each agent should act during the decentralized phase to maximize the informativeness of the collected observations, evaluated afterwards during the centralized inference phase.
- The agents should act so as to maximize the expected sum of shared rewards, accumulated at each time step over a finite horizon
- Active perception, collecting observations to reduce uncertainty about a hidden variable, is one of the fundamental capabilities of an intelligent agent 
- The problem can be formalized as a decentralized partially observable Markov decision process (Dec-POMDP) [3, 15], a general model of sequential multi-agent decision-making under uncertainty
- We show that the convex centralized prediction reward can be converted to a decentralized prediction reward that is a function of the hidden state and so-called individual prediction actions
- We prove the empirical usefulness of our results by applying standard Dec-POMDP solution algorithms to active perception problems, demonstrating improved scalability over the state-of-the-art
- We show that the expected decentralized prediction reward is at most equal to the centralized prediction reward, and that a similar relation holds between value functions in the respective Dec-POMDP and Dec-ρPOMDP
- We showed that multi-agent active perception modelled as a Dec-ρPOMDP can be reduced to a standard Dec-POMDP by introducing individual prediction actions
- The Dec-ρPOMDP the authors target is computationally more challenging (NEXP-complete ) than centralized POMDP, POMDP-IR, or ρPOMDP (PSPACE-complete ).
- As the PLAN subroutine of APAS, the authors use the finite-horizon variant of the policy graph improvement method of .
- The authors showed that multi-agent active perception modelled as a Dec-ρPOMDP can be reduced to a standard Dec-POMDP by introducing individual prediction actions.
- The difference between the optimal solution of the standard Dec-POMDP and the Dec-ρPOMDP is bounded.
- The authors' reduction enables application of any standard Dec-POMDP solver to multi-agent active perception problems, as demonstrated by the proposed APAS algorithm.
- The authors' results allow transferring advances in scalability for standard Dec-POMDPs to multi-agent active perception tasks.
- The authors' reduction result enables further investigation into learning for multi-agent active perception.
- An investigation of the necessary conditions for when the loss due to decentralization is zero is another future direction
- Table1: Average policy values ± standard error in the MAV (left) and the rovers domains (right)
- Table2: Average total runtime of APAS ± standard deviation in seconds in the MAV domain (K = 2) and the Rovers domain (K = 5)
- We briefly review possible formulations of multi-agent active perception problems, and then focus on the Dec-POMDP model that provides the most general formulation.
Multi-agent active perception has been formulated as a distributed constraint optimization problem (DCOP), submodular maximization, or as a specialized variant of a partially observable Markov decision process (POMDP). Probabilistic DCOPs with partial agent knowledge have been applied to signal source localization [9, 27]. DCOPs with Markovian dynamics have been proposed for target tracking by multiple sensors . DCOPs are a simpler model than Dec-POMDPs, as a fixed communication structure is assumed or the noise in the sensing process is not modelled. Submodular maximization approaches assume the agents’ reward can be stated as a submodular set function, and apply distributed greedy maximization to obtain an approximate solution [23, 8, 7]. Along with the structure of the reward function, inter-agent communication is typically assumed. Specialized variants of POMDPs may also be applied. If all-to-all communication without delay during task execution is available, centralized control is possible and the problem can be solved as a multi-agent POMDP . Auctioning of POMDP policies can facilitate multi-agent cooperation when agents can communicate . Best et al  propose a decentralized Monte Carlo tree search planner where agents periodically communicate their open-loop plans to each other.
- This project had received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 758824 —INFLUENCE). A Overview of supplementary appendix sections The supplementary appendix sections are structured as follows
- Mauricio Araya-López, Olivier Buffet, Vincent Thomas, and Francois Charpillet. A POMDP Extension with Belief-dependent Rewards. In Advances in Neural Information Processing Systems, pages 64–72, 2010.
- Ruzena Bajcsy, Yiannis Aloimonos, and John K Tsotsos. Revisiting active perception. Autonomous Robots, 42(2):177–196, 2018.
- Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4):819–840, 2002.
- Graeme Best, Oliver M Cliff, Timothy Patten, Ramgopal R Mettu, and Robert Fitch. Dec-MCTS: Decentralized planning for multi-robot active perception. The International Journal of Robotics Research, 38(2-3):316–337, 2019.
- Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
- Jesus Capitan, Matthijs T.J. Spaan, Luis Merino, and Anibal Ollero. Decentralized multi-robot cooperation with auctioned POMDPs. The International Journal of Robotics Research, 32(6):650–671, 2013.
- Micah Corah and Nathan Michael. Distributed matroid-constrained submodular maximization for multirobot exploration: Theory and practice. Autonomous Robots, 43(2):485–501, 2019.
- Bahman Gharesifard and Stephen L Smith. Distributed submodular maximization with limited information. IEEE Transactions on Control of Network Systems, 5(4):1635–1645, 2017.
- Manish Jain, Matthew Taylor, Milind Tambe, and Makoto Yokoo. DCOPs meet the real world: Exploring unknown reward matrices with applications to mobile sensor networks. In Intl. Joint Conference on Artificial Intelligence (IJCAI), pages 181–186, 2009.
- Mikko Lauri, Eero Heinänen, and Simone Frintrop. Multi-robot active information gathering with periodic communication. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 851–856, 2017.
- Mikko Lauri, Joni Pajarinen, and Jan Peters. Information Gathering in Decentralized POMDPs by Policy Graph Improvement. In Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pages 1143–1151, 2019.
- Mikko Lauri, Joni Pajarinen, and Jan Peters. Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement. Autonomous Agents and MultiAgent Systems, 34(42):1–44, 2020. Issue no. 2.
- Duc Thien Nguyen, William Yeoh, Hoong Chuin Lau, Shlomo Zilberstein, and Chongjie Zhang. Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs. In AAAI Conference on Artificial Intelligence, pages 1447–1455, 2014.
- Frans A. Oliehoek. Sufficient plan-time statistics for decentralized POMDPs. In Intl. Joint Conference on Artificial Intelligence (IJCAI), pages 302–308, 2013.
- Frans A. Oliehoek and Christopher Amato. A Concise Introduction to Decentralized POMDPs. Springer, 2016.
- Frans A. Oliehoek, Matthijs TJ Spaan, and Nikos Vlassis. Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research, 32:289–353, 2008.
- Frans A. Oliehoek, Stefan J. Witwicki, and Leslie P. Kaelbling. Influence-based abstraction for multiagent systems. In AAAI Conference on Artificial Intelligence, pages 1422–1428, 2012.
- Frans A. Oliehoek, Matthijs TJ Spaan, Bas Terwijn, Philipp Robbel, and João V Messias. The MADP toolbox: an open source library for planning and learning in (multi-) agent systems. The Journal of Machine Learning Research, 18(1):3112–3116, 2017.
- Joni K Pajarinen and Jaakko Peltonen. Periodic Finite State Controllers for Efficient POMDP and DECPOMDP Planning. In Advances in Neural Information Processing Systems, pages 2636–2644. 2011.
- Christos H Papadimitriou and John N Tsitsiklis. The complexity of markov decision processes. Mathematics of operations research, 12(3):441–450, 1987.
- Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, and Matthijs T. J. Spaan. Exploiting submodular value functions for scaling up active perception. Autonomous Robots, 42(2):209–233, 2018.
- Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans A. Oliehoek, and Martha White. Maximizing information gain in partially observable environments via prediction rewards. In Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pages 1215–1223, 2020.
- Amarjeet Singh, Andreas Krause, Carlos Guestrin, and William J Kaiser. Efficient informative sensing using multiple robots. Journal of Artificial Intelligence Research, 34:707–755, 2009.
- Noah A Smith and Roy W Tromble. Sampling uniformly from the unit simplex. Johns Hopkins University, Techical Report, 2004.
- Matthijs TJ Spaan and Pedro U Lima. A decision-theoretic approach to dynamic sensor selection in camera networks. In Intl. Conf. on Automated Planning and Scheduling (ICAPS), pages 297–304, 2009.
- Matthijs TJ Spaan, Tiago S Veiga, and Pedro U Lima. Decision-theoretic planning under uncertainty with information rewards for active cooperative perception. Autonomous Agents and Multi-Agent Systems, 29 (6):1157–1185, 2015.
- Matthew E Taylor, Manish Jain, Yanquin Jin, Makoto Yokoo, and Milind Tambe. When should there be a “me” in a “team”?: Distributed multi-agent optimization under uncertainty. In Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS), pages 109–116, 2010.