AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Different time scale learning of independent mechanisms can lead to a better generalization.

Meta Attention Networks: Meta-Learning Attention to Modulate Information Between Recurrent Independent Mechanisms

international conference on learning representations, (2021)

Cited by: 0|Views224
Full Text
Bibtex
Weibo

Abstract

Decomposing knowledge into interchangeable pieces promises a generalization advantage when there are changes in distribution. A learning agent interacting with the environment is likely to be faced with situations requiring novel combinations of existing pieces of knowledge. We hypothesize that such a decomposition of knowledge is particu...More

Code:

Data:

0
Introduction
  • The classical statistical framework of machine learning is focused on the assumption of independent and identically distributed (i.i.d) data, implying that the test data comes from the same distribution as the training data.
  • Humans seem to be able to learn a new task quickly by re-using relevant prior knowledge, raising two fundamental questions which the authors explore here: (1) how to separate knowledge into recomposable pieces, and (2) how to do this so as to achieve fast adaptation to new tasks or changes in distribution when a module may need to be modified or when different modules may need to be combined in new ways
  • For the former objective, instead of representing knowledge with a homogeneous architecture as in standard neural networks, the authors adopt recently proposed approaches (Goyal et al, 2019; Mittal et al, 2020; Goyal et al, 2020; Rahaman
Highlights
  • The classical statistical framework of machine learning is focused on the assumption of independent and identically distributed (i.i.d) data, implying that the test data comes from the same distribution as the training data
  • Due to a monolithic structure, when the task or the distribution changes, a majority of the components of the network are likely to adapt in response to these changes, potentially leading to catastrophic interferences between different tasks or pieces of knowledge (Andreas et al, 2016; Fernando et al, 2017; Shazeer et al, 2017; Jo et al, 2018; Rosenbaum et al, 2019; Alet et al, 2018; Kirsch et al, 2018; Goyal et al, 2019; 2020)
  • Humans seem to be able to learn a new task quickly by re-using relevant prior knowledge, raising two fundamental questions which we explore here: (1) how to separate knowledge into recomposable pieces, and (2) how to do this so as to achieve fast adaptation to new tasks or changes in distribution when a module may need to be modified or when different modules may need to be combined in new ways
  • We evaluate the proposed Meta-RIMs networks to answer the following questions: (a) Does the proposed method improve sample efficiency? We answer this positively in section 4.1. (b) Does the proposed method lead to policies that generalize better to systematic changes to the training distribution? We find positive evidence for this in section 4.2 (c) Does the proposed method lead to a faster adaptation to new distributions and a better curriculum learning regime to train agents in an incremental fashion by reusing the knowledge from previously learnt similar tasks? We evaluate this setting and find positive evidence in section 4.3
  • The experimental results on grounded language learning tasks in the reinforcement learning setting strongly indicate that the combination of meta-learning of the attention parameters and dynamically connected modular architectures with sparse communication, leads in many ways to superior results in terms of improved sample efficiency, and an improved transfer across tasks in a curriculum, both as zero-shot transfer and with adaptation
Methods
  • The authors evaluate the proposed Meta-RIMs networks to answer the following questions: (a) Does the proposed method improve sample efficiency? The authors answer this positively in section 4.1. (b) Does the proposed method lead to policies that generalize better to systematic changes to the training distribution? The authors find positive evidence for this in section 4.2 (c) Does the proposed method lead to a faster adaptation to new distributions and a better curriculum learning regime to train agents in an incremental fashion by reusing the knowledge from previously learnt similar tasks? The authors evaluate this setting and find positive evidence in section 4.3.
  • The authors find positive evidence for this in section 4.2 (c) Does the proposed method lead to a faster adaptation to new distributions and a better curriculum learning regime to train agents in an incremental fashion by reusing the knowledge from previously learnt similar tasks?
  • Sparse rewards, and a procedurally generated series of environments with a systematically incremental difficulty make faster learning challenging for reinforcement learning agents, but make it useful to address the questions the authors raised above.
  • Please refer to Appendix A.1 for additional details on the environments and hyperparameters used
Conclusion
  • This paper investigates using a meta-learning approach on modular architectures with sparse communication (as in RIMs (Goyal et al, 2019)) to capture short-term vs long-term aspects of the underlying mechanisms in the data generation process, by considering parameters of attention mechanism as meta-parameters and parameters of the recurrent modules as parameters.
  • Ablation studies further confirm that using a meta-learning approach to update different parameters of the network over different timescales leads to improvements in sample efficiency as compared to training all the parameters at once.
  • Overall, these results point towards an interesting way to perform meta-learning and attention-based modularization for better sample efficiency, out-of-distribution generalization and transfer learning
Tables
  • Table1: Zero shot Policy Transfer: The model is trained on the easiest environment, and transferred in a zero-shot manner to a more difficult and larger environment, outperforming the baselines in terms of both rewards (R) and success rates (S) as the difficulty of environment increases
Download tables as Excel
Related work
  • Meta-Learning: Meta-learning (Bengio et al, 1990; Schmidhuber, 1987) methods gives the flexibility to adapt to new environments rapidly with a few training examples, and has demonstrated success in both supervised learning such as few shot image classification (Ravi & Larochelle, 2016) and reinforcement learning (Wang et al, 2016; Santoro et al, 2016) settings. The most relevant modular meta-learning work is that of Alet et al (2018), which proposes to learn modular network architecture based on MAML, however their approach relies on pre-trained composable transformations. The goal of the current work is to learn the transformations (i.e decomposition of knowledge into separate modules), as well as how to dynamically route information among such modules.

    Meta-Learning to Disentangle Causal Mechanisms: Recently (Bengio et al, 2019; Ke et al, 2019) used meta-learning to learn causal mechanisms or causal dependencies between a set of high-level variables, that inspired the approach presented here. The ’modules’ in their work are the conditional distributions for each variable in a directed causal graphical model (Schölkopf et al, 2016). The inner-loop of meta-learning also allows the modules to be adapted within an episode (corresponding to an intervention distribution), while the outer-loop of meta-learning discovers how the modules are connected (statically) to each other to form the graph structure of the graphical model.
Study subjects and analysis
data: 5
1.0 fr1a.m5 es 2.0. Module Activations: A plot of module activations (y-axis) for a fixed-length input sequence (x-axis) for two of the environments for settings (n = 5, k = 3) and (n = 5, k = 2) shows a diverse and active participation from all modules (no dead modules) to dynamically respond to the inputs received. Episode 3 Episode 1

Reference
  • Ferran Alet, Tomás Lozano-Pérez, and Leslie P Kaelbling. Modular meta-learning. arXiv preprint arXiv:1806.10166, 2018.
    Findings
  • Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 39–48, 2016.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
    Findings
  • Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, and Aaron Courville. Systematic generalization: what is required and can it be learned? arXiv preprint arXiv:1811.12889, 2018.
    Findings
  • Yoshua Bengio, Samy Bengio, and Jocelyn Cloutier. Learning a synaptic learning rule. Citeseer, 1990.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. A meta-transfer objective for learning to disentangle causal mechanisms. arXiv:1901.10912, 2019.
    Findings
  • Léon Bottou and Patrick Gallinari. A framework for the cooperation of learning algorithms. In Advances in neural information processing systems, pp. 781–788, 1991.
    Google ScholarLocate open access versionFindings
  • Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, and Ruslan Salakhutdinov. Gated-attention architectures for task-oriented language grounding. arXiv preprint arXiv:1706.07230, 2017.
    Findings
  • Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, and Yoshua Bengio. Babyai: First steps towards grounded language learning with a human in the loop. arXiv preprint arXiv:1810.08272, 2018.
    Findings
  • Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. RlΘ2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
    Findings
  • Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A Rusu, Alexander Pritzel, and Daan Wierstra. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017.
    Findings
  • Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf. Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893, 2019.
    Findings
  • Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Sergey Levine, Charles Blundell, Yoshua Bengio, and Michael Mozer. Object files and schemata: Factorizing declarative and procedural knowledge in dynamical systems. arXiv preprint arXiv:2006.16225, 2020.
    Findings
  • Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max Jaderberg, Denis Teplyashin, et al. Grounded language learning in a simulated 3d world. arXiv preprint arXiv:1706.06551, 2017.
    Findings
  • Rein Houthooft, Yuhua Chen, Phillip Isola, Bradly Stadie, Filip Wolski, OpenAI Jonathan Ho, and Pieter Abbeel. Evolved policy gradients. In Advances in Neural Information Processing Systems, pp. 5400–5409, 2018.
    Google ScholarLocate open access versionFindings
  • Robert A Jacobs, Michael I Jordan, Steven J Nowlan, Geoffrey E Hinton, et al. Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991.
    Google ScholarLocate open access versionFindings
  • Jason Jo, Vikas Verma, and Yoshua Bengio. Modularity matters: Learning invariant relational reasoning tasks. arXiv preprint arXiv:1806.06765, 2018.
    Findings
  • Justin Johnson, Bharath Hariharan, Laurens Van Der Maaten, Judy Hoffman, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. Inferring and executing programs for visual reasoning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2989–2998, 2017.
    Google ScholarLocate open access versionFindings
  • Nan Rosemary Ke, Anirudh Goyal ALIAS PARTH GOYAL, Olexa Bilaniuk, Jonathan Binas, Michael C Mozer, Chris Pal, and Yoshua Bengio. Sparse attentive backtracking: Temporal credit assignment through reminding. In Advances in Neural Information Processing Systems, pp. 7640–7651, 2018.
    Google ScholarLocate open access versionFindings
  • Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Chris Pal, and Yoshua Bengio. Learning neural causal models from unknown interventions. arXiv preprint arXiv:1910.01075, 2019.
    Findings
  • Louis Kirsch, Julius Kunze, and David Barber. Modular networks: Learning to decompose neural computation. In Advances in Neural Information Processing Systems, pp. 2408–2418, 2018.
    Google ScholarLocate open access versionFindings
  • Louis Kirsch, Sjoerd van Steenkiste, and Jürgen Schmidhuber. Improving generalization in meta reinforcement learning using learned objectives. arXiv preprint arXiv:1910.04098, 2019.
    Findings
  • Brenden Lake and Marco Baroni. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In International Conference on Machine Learning, pp. 2873–2882. PMLR, 2018.
    Google ScholarLocate open access versionFindings
  • Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. A simple neural attentive metalearner. arXiv preprint arXiv:1707.03141, 2017.
    Findings
  • Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, and Yoshua Bengio. Learning to combine top-down and bottom-up signals in recurrent neural networks with attention over modules. arXiv preprint arXiv:2006.16981, 2020.
    Findings
  • Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
    Findings
  • Nasim Rahaman, Anirudh Goyal, Muhammad Waleed Gondal, Manuel Wuthrich, Stefan Bauer, Yash Sharma, Yoshua Bengio, and Bernhard Schölkopf. S2rms: Spatially structured recurrent modules. arXiv preprint arXiv:2007.06533, 2020.
    Findings
  • Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. 2016.
    Google ScholarFindings
  • Scott Reed and Nando De Freitas. Neural programmer-interpreters. arXiv preprint arXiv:1511.06279, 2015.
    Findings
  • Eric Ronco, Henrik Gollee, and Peter J Gawthrop. Modular neural networks and self-decomposition. Technical Report CSC-96012, 1997.
    Google ScholarFindings
  • Clemens Rosenbaum, Tim Klinger, and Matthew Riemer. Routing networks: Adaptive selection of non-linear functions for multi-task learning. arXiv preprint arXiv:1711.01239, 2017.
    Findings
  • Clemens Rosenbaum, Ignacio Cases, Matthew Riemer, and Tim Klinger. Routing networks and the challenges of modular and compositional computation. arXiv preprint arXiv:1904.12774, 2019.
    Findings
  • Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Metalearning with memory-augmented neural networks. In International conference on machine learning, pp. 1842–1850, 2016.
    Google ScholarLocate open access versionFindings
  • Adam Santoro, Ryan Faulkner, David Raposo, Jack W. Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy P. Lillicrap. Relational recurrent neural networks. CoRR, abs/1806.01822, 2018. URL http://arxiv.org/abs/1806.01822.
    Findings
  • Jurgen Schmidhuber. Evolutionary principles in self-referential learning. On learning how to learn: The meta-meta-... hook.) Diploma thesis, Institut f. Informatik, Tech. Univ. Munich, 1:2, 1987.
    Google ScholarFindings
  • B. Schölkopf, D. Janzing, and D. Lopez-Paz. Causal and statistical learning. In Oberwolfach Reports, volume 13(3), pp. 1896–1899, 2016. doi: 10.14760/OWR-2016-33. URL https://publications.mfo.de/handle/mfo/3537.
    Locate open access versionFindings
  • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    Findings
  • Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
    Findings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017.
    Google ScholarLocate open access versionFindings
  • Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
    Findings
  • Yi Wu, Yuxin Wu, Georgia Gkioxari, and Yuandong Tian. Building generalizable agents with a realistic and rich 3d environment. arXiv preprint arXiv:1801.02209, 2018.
    Findings
  • Zhongwen Xu, Hado P van Hasselt, and David Silver. Meta-gradient reinforcement learning. In Advances in neural information processing systems, pp. 2396–2407, 2018.
    Google ScholarLocate open access versionFindings
  • Haonan Yu, Haichao Zhang, and Wei Xu. Interactive grounded language acquisition and generalization in a 2d world. arXiv preprint arXiv:1802.01433, 2018.
    Findings
  • We used a variety of environments from MiniGrid and BabyAI (Chevalier-Boisvert et al., 2018) that provide a partial and egocentric view of the state of the environment to the agent. The reward is sparse and a positive reward is received only if the agent successfully reaches the goal. A penalty is awarded based on the number of steps taken to reach the goal, calculated as 1 − 0.9n/nmax, where nmax is the maximum number of steps allowed for a given environment and depends on the difficulty of the environment such that more difficult environments have a larger value of nmax. If the agent is not able to complete the task within nmax steps, the episode ends and it gets a zero reward. The environments have an increasing level of difficulty in an systematically incremental manner. These settings of partial observability, sparse rewards and a systematic increase in the difficulty levels make the task for reinforcement learning algorithms sufficiently difficult.
    Google ScholarLocate open access versionFindings
  • We used the Proximal Policy Optimization (Schulman et al., 2017) with parallelized data collection of rollouts collected by multiple parallel processes. For generalized advantage function, we used λ = 0.99, and discounted future rewards by a factor of γ = 0.99. Throughout the experiments, we present the mean-reward (R) and success-rate (S) of the agent, where the mean reward is the average reward across multiple runs, and the success rate represents the percentage of times the agent is able to successfully reach the goal within the nmax timesteps. For all of our environments, we used n = 5 total modules, with only k = 3 of them active at any given time. Further details on the specifics of each environment are provided in the Section A.4.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科