AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We show that meta-learning a model for online adaptation results in a method that is able to adapt to unseen situations or sudden and drastic changes in the environment, and is sample efficient to train

Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning

international conference on learning representations, (2019)

Cited: 269|Views256
EI
Full Text
Bibtex
Weibo

Abstract

Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations or unseen situations cause proficient but specialized policies to fail at test time. Given that it is impractical to train separate po...More

Code:

Data:

0
Introduction
  • Both model-based and model-free reinforcement learning (RL) methods generally operate in one of two regimes: all training is performed in advance, producing a model or policy that can be used at test-time to make decisions in settings that approximately match those seen during training; or, training is performed online, in which case the agent can slowly modify its behavior as it interacts with the environment
  • In both of these cases, dynamic changes such as failure of a robot’s components, encountering a new terrain, environmental factors such as lighting and wind, or other unexpected perturbations, can cause the agent to fail.
  • This view induces a more general meta-RL problem setting by allowing the notion of a task to represent anything from existing in a different part of the state space, to experiencing disturbances, or attempting to achieve a new goal
Highlights
  • Both model-based and model-free reinforcement learning (RL) methods generally operate in one of two regimes: all training is performed in advance, producing a model or policy that can be used at test-time to make decisions in settings that approximately match those seen during training; or, training is performed online, in which case the agent can slowly modify its behavior as it interacts with the environment
  • Our evaluation aims to answer the following questions: (1) Is adaptation changing the model? (2) Does our approach enable fast adaptation to varying dynamics, tasks, and environments, both inside and outside of the training distribution? (3) How does our method’s performance compare to that of other methods? (4) How do gradient-based adaptive learner (GrBAL) and recurrence-based adaptive learner (ReBAL) compare? (5) How does meta model-based RL compare to meta model-free RL in terms of sample efficiency and performance for these experiments? (6) Can our method learn to adapt online on a real robot, and if so, how does it perform? We present our set-up and results, motivated by these questions
  • We present an approach for model-based meta-RL that enables fast, online adaptation of large and expressive models in dynamic environments
  • We show that meta-learning a model for online adaptation results in a method that is able to adapt to unseen situations or sudden and drastic changes in the environment, and is sample efficient to train
  • We provide two instantiations of our approach (ReBAL and GrBAL), and we provide a comparison with other prior methods on a range of continuous control tasks
  • We show that, our approach is practical for real-world applications, and that this capability to adapt quickly is important under complex real-world dynamics
Methods
  • The authors' evaluation aims to answer the following questions: (1) Is adaptation changing the model? (2) Does the approach enable fast adaptation to varying dynamics, tasks, and environments, both inside and outside of the training distribution? (3) How does the method’s performance compare to that of other methods? (4) How do GrBAL and ReBAL compare? (5) How does meta model-based RL compare to meta model-free RL in terms of sample efficiency and performance for these experiments? (6) Can the method learn to adapt online on a real robot, and if so, how does it perform? The authors present the set-up and results, motivated by these questions.
  • The authors evaluate performance in two different situations: disabling a joint unseen during training, and switching between disabled joints during a rollout.
  • The former examines extrapolation to out-of-distribution environments, and the latter tests fast adaptation to changing dynamics
Conclusion
  • The authors present an approach for model-based meta-RL that enables fast, online adaptation of large and expressive models in dynamic environments.
  • The authors provide two instantiations of the approach (ReBAL and GrBAL), and the authors provide a comparison with other prior methods on a range of continuous control tasks.
  • The authors show that, the approach is practical for real-world applications, and that this capability to adapt quickly is important under complex real-world dynamics
Tables
  • Table1: Trajectory following costs for real-world GrBAL and MB results when tested on three terrains that were seen during training. Tested here for left turn (Left), straight line (Str), zig-zag (Z-z), and figure-8 shapes (F-8). The methods perform comparably, indicating that online adaptation is not needed in the training terrains, but including it is not detrimental
  • Table2: Reward functions
  • Table3: Hyperparameters for the half-cheetah tasks
  • Table4: Hyperparameters for the ant tasks
  • Table5: Hyperparameters for the 7-DoF arm tasks
Download tables as Excel
Related work
  • Advances in learning control policies have shown success on numerous complex and high dimensional tasks (Schulman et al, 2015; Lillicrap et al, 2015; Mnih et al, 2015; Levine et al, 2016; Silver et al, 2017). While reinforcement learning algorithms provide a framework for learning new tasks, they primarily focus on mastery of individual skills, rather than generalizing and quickly adapting to new scenarios. Furthermore, model-free approaches (Peters and Schaal, 2008) require large amounts of system interaction to learn successful control policies, which often makes them impractical for realworld systems. In contrast, model-based methods attain superior sample efficiency by first learning a model of system dynamics, and then using that model to optimize a policy (Deisenroth et al, 2013; Lenz et al, 2015; Levine et al, 2016; Nagabandi et al, 2017b; Williams et al, 2017). Our approach alleviates the need to learn a single global model by allowing the model to be adapted automatically to different scenarios online based on recent observations. A key challenge with model-based RL approaches is the difficulty of learning a global model that is accurate for the entire state space. Prior model-based approaches tackled this problem by incorporating model uncertainty using Gaussian Processes (GPs) (Ko and Fox, 2009; Deisenroth and Rasmussen, 2011; Doerr et al, 2017). However, these methods make additional assumptions on the system (such as smoothness), and does not scale to high dimensional environments. Chua et al (2018) has recently showed that neural networks models can also benefit from incorporating uncertainty, and it can lead to model-based methods that attain model-free performance with a significant reduction on sample complexity. Our approach is orthogonal to theirs, and can benefit from incorporating such uncertainty.
Reference
  • M. Al-Shedivat, T. Bansal, Y. Burda, I. Sutskever, I. Mordatch, and P. Abbeel. Continuous adaptation via meta-learning in nonstationary and competitive environments. CoRR, abs/1710.03641, 2017.
    Findings
  • M. Andrychowicz, M. Denil, S. G. Colmenarejo, M. W. Hoffman, D. Pfau, T. Schaul, and N. de Freitas. Learning to learn by gradient descent by gradient descent. CoRR, abs/1606.04474, 2016.
    Findings
  • K. J. Åström and B. Wittenmark. Adaptive control. Courier Corporation, 2013.
    Google ScholarFindings
  • A. Aswani, P. Bouffard, and C. Tomlin. Extensions of learning-based model predictive control for real-time application to a quadrotor helicopter. In American Control Conference (ACC), 2012. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167, 2016.
    Findings
  • Y. Bengio, S. Bengio, and J. Cloutier. Learning a synaptic learning rule. Université de Montréal, Département d’informatique et de recherche opérationnelle, 1990.
    Google ScholarFindings
  • D. A. Braun, A. Aertsen, D. M. Wolpert, and C. Mehring. Learning optimal adaptation strategies in unpredictable motor tasks. Journal of Neuroscience, 2009.
    Google ScholarLocate open access versionFindings
  • K. Chua, R. Calandra, R. McAllister, and S. Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. arXiv preprint arXiv:1805.12114, 2018.
    Findings
  • M. Deisenroth and C. E. Rasmussen. Pilco: A model-based and data-efficient approach to policy search. In International Conference on machine learning (ICML), pages 465–472, 2011.
    Google ScholarLocate open access versionFindings
  • M. P. Deisenroth, G. Neumann, J. Peters, et al. A survey on policy search for robotics. Foundations and Trends R in Robotics, 2(1–2):1–142, 2013.
    Google ScholarLocate open access versionFindings
  • A. Doerr, D. Nguyen-Tuong, A. Marco, S. Schaal, and S. Trimpe. Model-based policy search for automatic tuning of multivariate PID controllers. CoRR, abs/1703.02899, 2017. URL http://arxiv.org/abs/1703.02899.
    Findings
  • Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. Rl$ˆ2$: Fast reinforcement learning via slow reinforcement learning. CoRR, abs/1611.02779, 2016.
    Findings
  • C. Finn and S. Levine. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. CoRR, abs/1710.11622, 2017.
    Findings
  • C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. CoRR, abs/1703.03400, 2017.
    Findings
  • M. Fortunato, C. Blundell, and O. Vinyals. Bayesian recurrent neural networks. arXiv preprint arXiv:1704.02798, 2017.
    Findings
  • J. Fu, S. Levine, and P. Abbeel. One-shot learning of manipulation skills with online dynamics adaptation and neural network priors. CoRR, abs/1509.06841, 2015.
    Findings
  • S. Gu, T. Lillicrap, I. Sutskever, and S. Levine. Continuous deep q-learning with model-based acceleration. In International Conference on Machine Learning, pages 2829–2838, 2016.
    Google ScholarLocate open access versionFindings
  • S. Kelouwani, K. Adegnon, K. Agbossou, and Y. Dube. Online system identification and adaptive control for pem fuel cell maximum efficiency tracking. IEEE Transactions on Energy Conversion, 27(3):580–592, 2012.
    Google ScholarLocate open access versionFindings
  • J. Ko and D. Fox. Gp-bayesfilters: Bayesian filtering using gaussian process prediction and observation models. Autonomous Robots, 27(1):75–90, 2009.
    Google ScholarLocate open access versionFindings
  • B. Krause, L. Lu, I. Murray, and S. Renals. Multiplicative lstm for sequence modelling. arXiv preprint arXiv:1609.07959, 2016.
    Findings
  • B. Krause, E. Kahembwe, I. Murray, and S. Renals. Dynamic evaluation of neural sequence models. CoRR, abs/1709.07432, 2017.
    Findings
  • T. Kurutach, I. Clavera, Y. Duan, A. Tamar, and P. Abbeel. Model-ensemble trust-region policy optimization. arXiv preprint arXiv:1802.10592, 2018.
    Findings
  • B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 2015.
    Google ScholarLocate open access versionFindings
  • I. Lenz, R. A. Knepper, and A. Saxena. Deepmpc: Learning deep latent features for model predictive control. In Robotics: Science and Systems, 2015.
    Google ScholarLocate open access versionFindings
  • S. Levine and V. Koltun. Guided policy search. In International Conference on Machine Learning, pages 1–9, 2013.
    Google ScholarLocate open access versionFindings
  • S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research (JMLR), 2016.
    Google ScholarLocate open access versionFindings
  • K. Li and J. Malik. Learning to optimize. arXiv preprint arXiv:1606.01885, 2016.
    Findings
  • T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015.
    Findings
  • P. Manganiello, M. Ricco, G. Petrone, E. Monmasson, and G. Spagnuolo. Optimization of perturbative pv mppt methods through online system identification. IEEE Trans. Industrial Electronics, 61(12):6812–6821, 2014.
    Google ScholarLocate open access versionFindings
  • F. Meier and S. Schaal. Drifting gaussian processes with varying neighborhood sizes for online model learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2016. IEEE, May 2016.
    Google ScholarLocate open access versionFindings
  • F. Meier, D. Kappler, N. Ratliff, and S. Schaal. Towards robust online inverse dynamics learning. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. A simple neural attentive meta-learner. In NIPS 2017 Workshop on Meta-Learning, 2017.
    Google ScholarLocate open access versionFindings
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 2015.
    Google ScholarLocate open access versionFindings
  • T. Munkhdalai and H. Yu. Meta networks. arXiv preprint arXiv:1703.00837, 2017.
    Findings
  • T. Munkhdalai, X. Yuan, S. Mehri, T. Wang, and A. Trischler. Learning rapid-temporal adaptations. arXiv preprint arXiv:1712.09926, 2017.
    Findings
  • A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. CoRR, abs/1708.02596, 2017a.
    Findings
  • A. Nagabandi, G. Yang, T. Asmar, R. Pandya, G. Kahn, S. Levine, and R. S. Fearing. Learning image-conditioned dynamics models for control of under-actuated legged millirobots. arXiv preprint arXiv:1711.05253, 2017b.
    Findings
  • D. K. Naik and R. Mammone. Meta-neural networks that learn by learning. In Neural Networks, 1992. IJCNN., International Joint Conference on, volume 1, pages 437–442. IEEE, 1992.
    Google ScholarLocate open access versionFindings
  • P. Pastor, L. Righetti, M. Kalakrishnan, and S. Schaal. Online movement adaptation based on previous sensor experiences. In IEEE International Conference on Intelligent Robots and Systems (IROS), pages 365–371, 9 2011.
    Google ScholarLocate open access versionFindings
  • J. Peters and S. Schaal. Reinforcement learning of motor skills with policy gradients. Neural networks, 2008.
    Google ScholarFindings
  • A. Rai, G. Sutanto, S. Schaal, and F. Meier. Learning feedback terms for reactive planning and control. In Proceedings 2017 IEEE International Conference on Robotics and Automation (ICRA), Piscataway, NJ, USA, May 2017. IEEE.
    Google ScholarLocate open access versionFindings
  • S. Ravi and H. Larochelle. Optimization as a model for few-shot learning. International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • M. Rei. Online representation learning in recurrent neural language models. CoRR, abs/1508.03854, 2015.
    Findings
  • S. Sæmundsson, K. Hofmann, and M. P. Deisenroth. Meta reinforcement learning with latent variable gaussian processes. arXiv preprint arXiv:1803.07551, 2018.
    Findings
  • A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. One-shot learning with memory-augmented neural networks. arXiv preprint arXiv:1605.06065, 2016.
    Findings
  • S. S. Sastry and A. Isidori. Adaptive control of linearizable systems. IEEE Transactions on Automatic Control, 1989.
    Google ScholarLocate open access versionFindings
  • J. Schmidhuber. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 1992.
    Google ScholarLocate open access versionFindings
  • J. Schmidhuber and R. Huber. Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems, 1991.
    Google ScholarLocate open access versionFindings
  • J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel. Trust region policy optimization. CoRR, abs/1502.05477, 2015.
    Findings
  • D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al. Mastering the game of go without human knowledge. Nature, 2017.
    Google ScholarLocate open access versionFindings
  • F. Sung, L. Zhang, T. Xiang, T. Hospedales, and Y. Yang. Learning to learn: Meta-critic networks for sample efficient learning. arXiv preprint arXiv:1706.09529, 2017.
    Findings
  • M. Tanaskovic, L. Fagiano, R. Smith, P. Goulart, and M. Morari. Adaptive model predictive control for constrained linear systems. In Control Conference (ECC), 2013 European. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • S. Thrun and L. Pratt. Learning to learn: Introduction and overview. In Learning to learn. Springer, 1998.
    Google ScholarFindings
  • IEEE, 2012. S. J. Underwood and I. Husain. Online parameter estimation and adaptive control of permanent-magnet synchronous machines. IEEE Transactions on Industrial Electronics, 57(7):2435–2443, 2010.
    Google ScholarLocate open access versionFindings
  • M. Botvinick. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016. A. Weinstein and M. Botvinick. Structure learning in motor control: A deep reinforcement learning model.
    Findings
  • CoRR, abs/1706.06827, 2017.
    Findings
  • G. Williams, A. Aldrich, and E. Theodorou. Model predictive path integral control using covariance variable importance sampling. CoRR, abs/1509.01149, 2015.
    Findings
  • G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou. Information theoretic mpc for model-based reinforcement learning. In International Conference on Robotics and Automation (ICRA), 2017.
    Google ScholarLocate open access versionFindings
  • A. S. Younger, S. Hochreiter, and P. R. Conwell. Meta-learning with backpropagation. In International Joint Conference on Neural Networks. IEEE, 2001.
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn