A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

european conference on computer vision, pp. 471-490, 2020.

Cited by: 4|Bibtex|Views79
Other Links: arxiv.org|academic.microsoft.com
Weibo:
To study our algorithmic ability to address tasks which require close and frequent collaboration, we introduce the furniture moving task, set in the AI2-THOR environment

Abstract:

Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we in...More
0
Introduction
  • Collaboration is the defining principle of the society. Humans have refined strategies to efficiently collaborate, developing verbal, deictic, and kinesthetic means.
  • Multi-agent, collaborative tasks have not been studied until very recently [23,42].
  • While existing tasks are well designed to study some aspects of collaboration, they often don’t require agents to closely collaborate throughout the task.
  • Instead such tasks either require initial coordination followed by almost independent execution, denotes equal contribution by UJ and LW
Highlights
  • Collaboration is the defining principle of our society
  • To study our algorithmic ability to address tasks which require close and frequent collaboration, we introduce the furniture moving (FurnMove) task, set in the AI2-THOR environment
  • Addressing challenge 1, we introduce SYNC (Synchronize Your actioNs Coherently) policies which permit expressive joint policies for decentralized agents while using interpretable communication
  • To ameliorate challenge 2 we introduce the Coordination Loss (CORDIAL) that replaces the standard entropy loss in actor-critic algorithms and guides agents away from actions that are mutually incompatible
  • In contrast to the above single-agent embodied tasks and approaches, we focus on collaboration between multiple embodied agents
  • We show how SYNC can be integrated into TBONE to allow our agents to represent high rank joint distributions over multi-actions
Results
  • Using SYNC-policies and CORDIAL, the agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines.
  • Tab. 2 shows similar trends for FurnLift but, perhaps surprisingly, the Success of SYNC is somewhat lower than the marginal model (2.6% lower, within statistical error)
Conclusion
  • The authors include a video of policy roll-outs in the supplementary material.
  • This includes four clips, each corresponding to the rollout on a test scene of one of the models trained to complete the FurnMove task.
  • The authors notice lower pitches for MoveWithObject and MoveObject actions.The authors introduce FurnMove, a collaborative, visual, multi-agent task requiring close coordination between agents and develop novel methods that allow for moving beyond existing marginal action sampling procedures, these methods lead to large gains across a diverse suite of metrics
Summary
  • Introduction:

    Collaboration is the defining principle of the society. Humans have refined strategies to efficiently collaborate, developing verbal, deictic, and kinesthetic means.
  • Multi-agent, collaborative tasks have not been studied until very recently [23,42].
  • While existing tasks are well designed to study some aspects of collaboration, they often don’t require agents to closely collaborate throughout the task.
  • Instead such tasks either require initial coordination followed by almost independent execution, denotes equal contribution by UJ and LW
  • Results:

    Using SYNC-policies and CORDIAL, the agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines.
  • Tab. 2 shows similar trends for FurnLift but, perhaps surprisingly, the Success of SYNC is somewhat lower than the marginal model (2.6% lower, within statistical error)
  • Conclusion:

    The authors include a video of policy roll-outs in the supplementary material.
  • This includes four clips, each corresponding to the rollout on a test scene of one of the models trained to complete the FurnMove task.
  • The authors notice lower pitches for MoveWithObject and MoveObject actions.The authors introduce FurnMove, a collaborative, visual, multi-agent task requiring close coordination between agents and develop novel methods that allow for moving beyond existing marginal action sampling procedures, these methods lead to large gains across a diverse suite of metrics
Tables
  • Table1: Quantitative results on three tasks. ↑ (or ↓) indicates that higher (or lower) value of the metric is desirable while denotes that the metric is simply informational and no value is, a priori, better than another. †denotes that a centralized agent serves only as an upper bound to decentralized methods and cannot be fairly compared with. Note that, among decentralized agents, our
  • Table2: Quantitative results on the FurnLift task. For legend, see Tab. 1
  • Table3: Effect of number of mixture components m on SYNC ’s performance
  • Table4: Ablation study of coordination loss on marginal [<a class="ref-link" id="c42" href="#r42">42</a>], SYNC , and central methods. Marginal performs better without CORDIAL whereas SYNC and central show improvement with CORDIAL added to overall loss value. †denotes that a centralized agent serve only as an upper bound to decentralized methods
  • Table5: Table 5
  • Table6: Table 6
  • Table7: Table 7
  • Table8: Table 8
  • Table9: Estimates, and corresponding robust bootstrap standard errors, for the parameters of communication analysis (Sec. A.5)
Download tables as Excel
Related work
  • We start by reviewing single agent embodied AI tasks followed by non-visual Multi-Agent RL (MARL) and end with visual MARL. Single-agent embodied systems: Single-agent embodied systems have been considered extensively in the literature. For instance, literature on visual navigation, i.e., locating an object of interest given only visual input, spans geometric and learning based methods. Geometric approaches have been proposed separately for mapping and planning phases of navigation. Methods entailing structure-from-motion and SLAM [91,80,25,13,72,81] were used to build maps. Planning algorithms on existing maps [14,46,52] and combined mapping & planning [26,50,49,30,6] are other related research directions.
Funding
  • This material is based upon work supported in part by the National Science Foundation under Grants No 1563727, 1718221, 1637479, 165205, 1703166, Samsung, 3M, Sloan Fellowship, NVIDIA Artificial Intelligence Lab, Allen Institute for AI, Amazon, and AWS Research Awards
Study subjects and analysis
workers: 45
For training we augment the A3C algorithm [66] with CORDIAL. For our studies in the visual domain, we use 45 workers and 8 GPUs. Models take around two days to train

studies: 4
6.3 Quantitative evaluation. We conduct four studies: (a) performance of different methods and relative difficulty of the three tasks, (b) effect of number of components on SYNC performance, (c) effect of CORDIAL (ablation), and (d) effect of number of agents. Comparing methods and tasks

Reference
  • Abel, D., Agarwal, A., Diaz, F., Krishnamurthy, A., Schapire, R.E.: Exploratory gradient boosting for reinforcement learning in complex domains. arXiv preprint arXiv:1603.04119 (2016)
    Findings
  • Anderson, P., Chang, A., Chaplot, D.S., Dosovitskiy, A., Gupta, S., Koltun, V., Kosecka, J., Malik, J., Mottaghi, R., Savva, M., et al.: On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757 (2018)
    Findings
  • Anderson, P., Shrivastava, A., Parikh, D., Batra, D., Lee, S.: Chasing ghosts: Instruction following as bayesian state tracking. In: NeurIPS (2019)
    Google ScholarFindings
  • Anderson, P., Wu, Q., Teney, D., Bruce, J., Johnson, M., Sunderhauf, N., Reid, I., Gould, S., van den Hengel, A.: Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In: CVPR (2018)
    Google ScholarFindings
  • Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)
    Findings
  • Aydemir, A., Pronobis, A., Gbelbecker, M., Jensfelt, P.: Active visual object search in unknown environments using uncertain semantics. In: IEEE Trans. on Robotics (2013)
    Google ScholarFindings
  • Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., Mordatch, I.: Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 (2019)
    Findings
  • Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. J. of Artificial Intelligence Research (2013)
    Google ScholarLocate open access versionFindings
  • Boutilier, C.: Sequential optimality and coordination in multiagent systems. In: IJCAI (1999)
    Google ScholarFindings
  • Bratman, J., Shvartsman, M., Lewis, R.L., Singh, S.: A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints. In: Proc. Int.’l Conv. on Cognitive Modeling (2010)
    Google ScholarLocate open access versionFindings
  • Brodeur, S., Perez, E., Anand, A., Golemo, F., Celotti, L., Strub, F., Rouat, J., Larochelle, H., Courville, A.: Home: A household multimodal environment. arXiv preprint arXiv:1711.11017 (2017)
    Findings
  • Busoniu, L., Babuska, R., Schutter, B.D.: A comprehensive survey of multiagent reinforcement learning. In: IEEE Trans. on Systems, Man and Cybernetics (2008)
    Google ScholarLocate open access versionFindings
  • Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Reid, I., Leonard, J.J.: Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. on Robotics (2016)
    Google ScholarLocate open access versionFindings
  • Canny, J.: The complexity of robot motion planning. MIT Press (1988)
    Google ScholarFindings
  • Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3D: Learning from RGB-D data in indoor environments. In: 3DV (2017)
    Google ScholarLocate open access versionFindings
  • Chaplot, D.S., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural mapping. In: ICLR (2020)
    Google ScholarFindings
  • Chen, B., Song, S., Lipson, H., Vondrick, C.: Visual hide and seek. arXiv preprint arXiv:1910.07882 (2019)
    Findings
  • Chen, H., Suhr, A., Misra, D., Snavely, N., Artzi, Y.: Touchdown: Natural language navigation and spatial reasoning in visual street environments. In: CVPR (2019)
    Google ScholarFindings
  • Chen∗, C., Jain∗, U., Schissler, C., Gari, S.V.A., Al-Halah, Z., Ithapu, V.K., Robinson, P., Grauman, K.: Audio-visual embodied navigation. arXiv preprint arXiv:1912.11474 (2019), ∗ equal contribution
    Findings
  • Daftry, S., Bagnell, J.A., Hebert, M.: Learning transferable policies for monocular reactive mav control. In: Proc. ISER (2016)
    Google ScholarLocate open access versionFindings
  • Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied Question Answering. In: CVPR (2018)
    Google ScholarFindings
  • Das, A., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Neural Modular Control for Embodied Question Answering. In: ECCV (2018)
    Google ScholarFindings
  • Das, A., Gervet, T., Romoff, J., Batra, D., Parikh, D., Rabbat, M., Pineau, J.: Tarmac: Targeted multi-agent communication. In: ICML (2019)
    Google ScholarFindings
  • Das∗, A., Carnevale∗, F., Merzic, H., Rimell, L., Schneider, R., Abramson, J., Hung, A., Ahuja, A., Clark, S., Wayne, G., et al.: Probing emergent semantics in predictive agents via question answering. In: ICML (2020), ∗ equal contribution
    Google ScholarFindings
  • Dellaert, F., Seitz, S., Thorpe, C., Thrun, S.: Structure from Motion without Correspondence. In: CVPR (2000)
    Google ScholarFindings
  • Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Computer (1989)
    Google ScholarLocate open access versionFindings
  • Foerster, J.N., Assael, Y.M., de Freitas, N., Whiteson, S.: Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In: NeurIPS (2016)
    Google ScholarFindings
  • Foerster, J.N., Farquhar, G., Afouras, T., NArdelli, N., Whiteson, S.: Counterfactual Multi-Agent Policy Gradients. In: AAAI (2018)
    Google ScholarFindings
  • Foerster, J.N., Nardelli, N., Farquhar, G., Torr, P.H.S., Kohli, P., Whiteson, S.: Stabilising experience replay for deep multi-agent reinforcement learning. In: ICML (2017)
    Google ScholarFindings
  • Fraundorfer, F., Heng, L., Honegger, D., Lee, G.H., Meier, L., Tanskanen, P., Pollefeys, M.: Vision-based autonomous mapping and exploration using a quadrotor mav. In: IROS (2012)
    Google ScholarFindings
  • Gao, R., Chen, C., Al-Halah, Z., Schissler, C., Grauman, K.: Visualechoes: Spatial image representation learning through echolocation. arXiv preprint arXiv:2005.01616 (2020)
    Findings
  • Giles, C.L., Jim, K.C.: Learning communication for multi-agent systems. In: Proc. Innovative Concepts for Agent-Based Systems (2002)
    Google ScholarLocate open access versionFindings
  • Giusti, A., Guzzi, J., Ciresan, D.C., He, F.L., Rodrıguez, J.P., Fontana, F., Faessler, M., Forster, C., Schmidhuber, J., Di Caro, G., et al.: A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters (2015)
    Google ScholarLocate open access versionFindings
  • Gordon, D., Kembhavi, A., Rastegari, M., Redmon, J., Fox, D., Farhadi, A.: IQA: Visual Question Answering in Interactive Environments. In: CVPR (2018)
    Google ScholarFindings
  • Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social gan: Socially acceptable trajectories with generative adversarial networks. In: CVPR (2018)
    Google ScholarFindings
  • Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative Multi-Agent Control Using Deep Reinforcement Learning. In: AAMAS (2017)
    Google ScholarFindings
  • Henriques, J.F., Vedaldi, A.: Mapnet: An allocentric spatial memory for mapping environments. In: CVPR (2018)
    Google ScholarFindings
  • Hill, F., Hermann, K.M., Blunsom, P., Clark, S.: Understanding grounded language learning agents. arXiv preprint arXiv:1710.09867 (2017)
    Findings
  • Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation (1997)
    Google ScholarLocate open access versionFindings
  • Inc., W.R.: Mathematica, Version 12.1, https://www.wolfram.com/mathematica, champaign, IL, 2020
    Findings
  • Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castaneda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., et al.: Humanlevel performance in 3d multiplayer games with population-based reinforcement learning. Science (2019)
    Google ScholarLocate open access versionFindings
  • Jain∗, U., Weihs∗, L., Kolve, E., Rastegari, M., Lazebnik, S., Farhadi, A., Schwing, A.G., Kembhavi, A.: Two body problem: Collaborative visual task completion. In: CVPR (2019), ∗ equal contribution
    Google ScholarFindings
  • Johnson, M., Hofmann, K., Hutton, T., Bignell, D.: The malmo platform for artificial intelligence experimentation. In: IJCAI (2016)
    Google ScholarFindings
  • Kahn, G., Zhang, T., Levine, S., Abbeel, P.: Plato: Policy learning using adaptive trajectory optimization. In: ICRA (2017)
    Google ScholarFindings
  • Kasai, T., Tenmoto, H., Kamiya, A.: Learning of communication codes in multiagent reinforcement learning problem. In: Proc. IEEE Soft Computing in Industrial Applications (2008)
    Google ScholarLocate open access versionFindings
  • Kavraki, L.E., Svestka, P., Latombe, J.C., Overmars, M.H.: Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE transactions on Robotics and Automation (1996)
    Google ScholarLocate open access versionFindings
  • Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jakowski, W.: Vizdoom: A doom-based ai research platform for visual reinforce- ment learning. In: Proc. IEEE Conf. on Computational Intelligence and Games (2016)
    Google ScholarLocate open access versionFindings
  • Kolve, E., Mottaghi, R., Han, W., VanderBilt, E., Weihs, L., Herrasti, A., Gordon, D., Zhu, Y., Gupta, A., Farhadi, A.: AI2-THOR: an interactive 3d environment for visual AI. arXiv preprint arXiv:1712.05474 (2019)
    Findings
  • Konolige, K., Bowman, J., Chen, J., Mihelich, P., Calonder, M., Lepetit, V., Fua, P.: View-based maps. Intl. J. of Robotics Research (2010)
    Google ScholarLocate open access versionFindings
  • Kuipers, B., Byun, Y.T.: A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations. Robotics and autonomous systems (1991)
    Google ScholarFindings
  • Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: ICML (2000)
    Google ScholarFindings
  • Lavalle, S.M., Kuffner, J.J.: Rapidly-exploring random trees: Progress and prospects. Algorithmic and Computational Robotics: New Directions (2000)
    Google ScholarFindings
  • Lazaridou, A., Peysakhovich, A., Baroni, M.: Multi-agent cooperation and the emergence of (natural) language. In: arXiv preprint arXiv:1612.07182 (2016)
    Findings
  • Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. In: ICML (2016)
    Google ScholarFindings
  • Liu, Y.C., Tian, J., Glaser, N., Kira, Z.: When2com: Multi-agent perception via communication graph grouping. In: CVPR (2020)
    Google ScholarFindings
  • Liu, Y.C., Tian, J., Ma, C.Y., Glaser, N., Kuo, C.W., Kira, Z.: Who2com: Collaborative perception via learnable handshake communication. In: ICRA (2020)
    Google ScholarFindings
  • Liu∗, I.J., Yeh∗, R., Schwing, A.G.: PIC: Permutation Invariant Critic for MultiAgent Deep Reinforcement Learning. In: CoRL (2019), ∗ equal contribution
    Google ScholarFindings
  • Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In: NeurIPS (2017)
    Google ScholarFindings
  • Manolis Savva*, Abhishek Kadian*, Oleksandr Maksymets*, Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V., Malik, J., Parikh, D., Batra, D.: Habitat: A Platform for Embodied AI Research. In: ICCV (2019)
    Google ScholarFindings
  • Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IROS (2007)
    Google ScholarFindings
  • Melo, F.S., Spaan, M., Witwicki, S.J.: QueryPOMDP: POMDP-based communication in multiagent systems. In: Eurpoean Workshop on Multi-Agent Systems (2011)
    Google ScholarLocate open access versionFindings
  • Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A., Banino, A., Denil, M., Goroshin, R., Sifre, L., Kavukcuoglu, K., et al.: Learning to navigate in complex environments. In: ICLR (2017)
    Google ScholarFindings
  • Mirowski, P., Banki-Horvath, A., Anderson, K., Teplyashin, D., Hermann, K.M., Malinowski, M., Grimes, M.K., Simonyan, K., Kavukcuoglu, K., Zisserman, A., et al.: The streetlearn environment and dataset. arXiv preprint arXiv:1903.01292 (2019)
    Findings
  • Mirowski, P., Grimes, M., Malinowski, M., Hermann, K.M., Anderson, K., Teplyashin, D., Simonyan, K., Zisserman, A., Hadsell, R., et al.: Learning to navigate in cities without a map. In: NeurIPS (2018)
    Google ScholarFindings
  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature (2015)
    Google ScholarLocate open access versionFindings
  • Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: ICML (2016)
    Google ScholarFindings
  • Mordatch, I., Abbeel, P.: Emergence of Grounded Compositional Language in Multi-Agent Populations. In: AAAI (2018)
    Google ScholarFindings
  • Oh, J., Chockalingam, V., Singh, S., Lee, H.: Control of memory, active perception, and action in minecraft. In: ICML (2016)
    Google ScholarFindings
  • Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: ICML (2017)
    Google ScholarFindings
  • Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems. In: AAMAS (2005)
    Google ScholarFindings
  • Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long, H., Wang, J.: Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017)
    Findings
  • R. C. Smith, R.C., Cheeseman, P.: On the representation and estimation of spatial uncertainty. Intl. J. Robotics Research (1986)
    Google ScholarLocate open access versionFindings
  • R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019), https://www. R-project.org/
    Locate open access versionFindings
  • Ramakrishnan, S.K., Jayaraman, D., Grauman, K.: An exploration of embodied visual exploration. arXiv preprint arXiv:2001.02192 (2020)
    Findings
  • Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)
    Google ScholarFindings
  • Recht, B., Re, C., Wright, S., Niu, F.: Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: NeurIPS (2011)
    Google ScholarFindings
  • Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: AISTATS (2011)
    Google ScholarFindings
  • Savinov, N., Dosovitskiy, A., Koltun, V.: Semi-parametric topological memory for navigation. In: ICLR (2018)
    Google ScholarFindings
  • Savva, M., Chang, A.X., Dosovitskiy, A., Funkhouser, T., Koltun, V.: Minos: Multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931 (2017)
    Findings
  • Schnberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
    Google ScholarFindings
  • Smith, R.C., Self, M., Cheeseman, P.: Estimating uncertain spatial relationships in robotics. In: UAI (1986)
    Google ScholarFindings
  • Suhr, A., Yan, C., Schluger, J., Yu, S., Khader, H., Mouallem, M., Zhang, I., Artzi, Y.: Executing instructions in situated collaborative interactions. In: EMNLP (2019)
    Google ScholarFindings
  • Sukhbaatar, S., Szlam, A., Fergus, R.: Learning multiagent communication with backpropagation. In: NeurIPS (2016)
    Google ScholarFindings
  • Sukhbaatar, S., Szlam, A., Synnaeve, G., Chintala, S., Fergus, R.: Mazebase: A sandbox for learning from games. arXiv preprint arXiv:1511.07401 (2015)
    Findings
  • Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
    Google ScholarFindings
  • Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: NeurIPS (2016)
    Google ScholarFindings
  • Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., Vicente, R.: Multiagent cooperation and competition with deep reinforcement learning. In: PloS (2017)
    Google ScholarFindings
  • Tan, M.: Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In: ICML (1993)
    Google ScholarFindings
  • Tesauro, G.: Extending q-learning to general adaptive multi-agent systems. In: NeurIPS (2004)
    Google ScholarFindings
  • Thomason, J., Gordon, D., Bisk, Y.: Shifting the baseline: Single modality performance on visual navigation & qa. In: NAACL (2019)
    Google ScholarFindings
  • Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. IJCV (1992)
    Google ScholarLocate open access versionFindings
  • Toussaint, M.: Learning a world model and planning with a self-organizing, dynamic neural system. In: NeurIPS (2003)
    Google ScholarFindings
  • Usunier, N., Synnaeve, G., Lin, Z., Chintala, S.: Episodic exploration for deep deterministic policies: An application to starcraft micromanagement tasks. In: ICLR (2016) 94. de Vries, H., Shuster, K., Batra, D., Parikh, D., Weston, J., Kiela, D.: Talk the walk: Navigating new york city through grounded dialogue. arXiv preprint arXiv:1807.03367 (2018)
    Findings
  • 95. Wang, X., Huang, Q., Celikyilmaz, A., Gao, J., Shen, D., Wang, Y.F., Wang, W.Y., Zhang, L.: Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In: CVPR (2019)
    Google ScholarFindings
  • 96. Weihs, L., Kembhavi, A., Han, W., Herrasti, A., Kolve, E., Schwenk, D., Mottaghi, R., Farhadi, A.: Artificial agents learn flexible visual representations by playing a hiding game. arXiv preprint arXiv:1912.08195 (2019)
    Findings
  • 97. Wijmans, E., Datta, S., Maksymets, O., Das, A., Gkioxari, G., Lee, S., Essa, I., Parikh, D., Batra, D.: Embodied Question Answering in Photorealistic Environments with Point Cloud Perception. In: CVPR (2019)
    Google ScholarFindings
  • 98. Wortsman, M., Ehsani, K., Rastegari, M., Farhadi, A., Mottaghi, R.: Learning to learn how to learn: Self-adaptive visual navigation using meta-learning. In: CVPR (2019)
    Google ScholarFindings
  • 99. Wu, Y., Wu, Y., Tamar, A., Russell, S., Gkioxari, G., Tian, Y.: Bayesian relational memory for semantic visual navigation. ICCV (2019)
    Google ScholarLocate open access versionFindings
  • 100. Wymann, B., Espie, E., Guionneau, C., Dimitrakakis, C., Coulom, R., Sumner, A.: Torcs, the open racing car simulator (2013), http://www.torcs.org
    Findings
  • 101. Xia, F., Shen, W.B., Li, C., Kasimbeg, P., Tchapmi, M., Toshev, A., MartınMartın, R., Savarese, S.: Interactive gibson: A benchmark for interactive navigation in cluttered environments. arXiv preprint arXiv:1910.14442 (2019)
    Findings
  • 102. Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S.: Gibson env: Realworld perception for embodied agents. In: CVPR (2018)
    Google ScholarFindings
  • 103. Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Visual curiosity: Learning to ask questions to learn visual recognition. In: CoRL (2018)
    Google ScholarFindings
  • 104. Yang, J., Ren, Z., Xu, M., Chen, X., Crandall, D., Parikh, D., Batra, D.: Embodied amodal recognition: Learning to move to perceive objects. In: ICCV (2019)
    Google ScholarFindings
  • 105. Yang, W., Wang, X., Farhadi, A., Gupta, A., Mottaghi, R.: Visual semantic navigation using scene priors. In: ICLR (2018)
    Google ScholarFindings
  • 106. Zhang, K., Yang, Z., Basar, T.: Multi-agent reinforcement learning: A selective overview of theories and algorithms. arXiv preprint arXiv:1911.10635 (2019)
    Findings
  • 107. Zhu, Y., Mottaghi, R., Kolve, E., Lim, J.J., Gupta, A., Fei-Fei, L., Farhadi, A.: Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. In: ICRA (2017)
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments