AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We introduced Gaussian process dynamic programming, a value-function based Reinforcement learning algorithm for continuous-valued state and action spaces

Gaussian process dynamic programming

Neurocomputing, no. 7-9 (2009): 1508-1524

Cited by: 244|Views26
EI

Abstract

Reinforcement learning (RL) and optimal control of systems with continuous states and actions require approximation techniques in most interesting cases. In this article, we introduce Gaussian process dynamic programming (GPDP), an approximate value function-based RL algorithm. We consider both a classic optimal control problem, where pro...More

Code:

Data:

0
Introduction
  • Reinforcement learning (RL) is based on the principle of experience-based, goaldirected learning.
  • If the authors call the RL algorithm “controller” and identify actions with the “control signal” the authors have a one-to-one mapping from reinforcement learning to optimal control if the surrounding world is fully known.
  • For an initial state x0 ∈ nx and a policy π, the expected cumulative cost of a finite N -step optimization horizon is N −1.
  • The function gterm is a control-independent terminal cost that incurs at the last time step N.
  • The associated state-value function V ∗ satisfies Bellman’s equation
Highlights
  • Reinforcement learning (RL) is based on the principle of experience-based, goaldirected learning
  • In contrast to supervised learning, where labels are provided from an external supervisor, a reinforcement learning algorithm must be able to learn from experience collected through interaction with the surrounding world
  • We introduce and analyze the Gaussian process dynamic programming (GPDP) algorithm
  • We assume that the dynamics are deterministic and known, but the value function is unknown and modeled by GPv. This case corresponds to the standard Gaussian process dynamic programming setting (Algorithm 2) we have proposed in previous work, (Deisenroth et al, 2008a)
  • We introduced Gaussian process dynamic programming (GPDP), a value-function based Reinforcement learning algorithm for continuous-valued state and action spaces
  • In the context of a classic optimal control problem, the under-actuated pendulum swing up, we have shown that Gaussian process dynamic programming yields a near-optimal solution
Results
  • In the considered particular trajectory, the total cost of the GPDP controller is approximately 9% higher than the total cost incurring when applying the DP controller.
  • Remember that the reward function (15) is independent of the angular velocity and the control signal.
  • For this particular trajectory, the cumulative reward of ALGPDP is approximately 7% lower than the cumulative reward of the optimal dynamic programming solution
Conclusion
  • Training GPq scales cubically in the number of actions used for training. If the action space cannot be covered with training points, sub-sampling actions is possible to speed up training: Assume that the most relevant part of the Q∗k-function (line 7 of Algorithm 2) is the one close to the optimum and choose those M actions that yield the lowest expected cost in state xi.
  • If the action space cannot be covered with training points, sub-sampling actions is possible to speed up training: Assume that the most relevant part of the Q∗k-function is the one close to the optimum and choose those M actions that yield the lowest expected cost in state xi
  • These M actions define U and are the training inputs of GPq in Algorithm 2.
  • Sparse approximations will be unavoidable if the data sets become remarkably larger
  • This fact is due to the scaling properties of Gaussian process training.Probabilistic models in artificial learning algorithms can speed up learning noticeably as they quantify uncertainty in experience-based knowledge and alleviate model bias.
  • In this setting, the authors still required problem-specific knowledge
Tables
  • Table1: Solutions to integral (11)
  • Table2: Training sets of GP models involved in Algorithm 4
  • Table3: Real computation times of ALGPDP on a standard computer
Download tables as Excel
Funding
  • MPD acknowledges support by the German Research Foundation (DFG) through grant RA 1030/1-3 to CER. A Gaussian Process Prediction with Uncertain Inputs In the following, we re-state results from Rasmussen and Ghahramani (2003), O’Hagan (1991), Girard et al (2003), and Kuss (2006) of how to predict with Gaussian processes when the test input is uncertain. Consider the problem of predicting a function value h(x∗) for an uncertain test input x∗, where h ∼ GP with a squared exponential kernel kh and x∗ ∼ N (μ, Σ)
Reference
  • Atkeson, C. G., 1994. Using Local Trajectory Optimizers to Speed up Global Optimization in Dynamic Programming. In: Hanson, J. E., Moody, S. J., Lippmann, R. P. (Eds.), Advances in Neural Information Processing Systems 6. Morgan Kaufmann, pp. 503–521.
    Google ScholarLocate open access versionFindings
  • Atkeson, C. G., Santamarıa, J. C., 1997. A Comparison of Direct and ModelBased Reinforcement Learning. In: Proceedings of the International Conference on Robotics and Automation.
    Google ScholarLocate open access versionFindings
  • Atkeson, C. G., Schaal, S., July 1997. Robot Learning from Demonstration. In: Fisher Jr., D. H. (Ed.), Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, Nashville, TN, USA, pp. 12–20.
    Google ScholarLocate open access versionFindings
  • Bellman, R. E., 1957. Dynamic Programming. Princeton University Press, Princeton, NJ, USA.
    Google ScholarFindings
  • Bertsekas, D. P., 200Dynamic Programming and Optimal Control, 3rd Edition. Vol. 1 of Optimization and Computation Series. Athena Scientific, Belmont, MA, USA.
    Google ScholarFindings
  • Bertsekas, D. P., 2007. Dynamic Programming and Optimal Control, 3rd Edition. Vol. 2 of Optimization and Computation Series. Athena Scientific, Belmont, MA, USA.
    Google ScholarFindings
  • Bertsekas, D. P., Tsitsiklis, J. N., 1996. Neuro-Dynamic Programming. Optimization and Computation. Athena Scientific, Belmont, MA, USA.
    Google ScholarLocate open access versionFindings
  • Bryson, A. E., Ho, Y.-C., 1975. Applied Optimal Control: Optimization, Estimation, and Control. Hemisphere, New York City, NY, USA.
    Google ScholarFindings
  • Chaloner, K., Verdinelli, I., 1995. Bayesian Experimental Design: A Review. Statistical Science 10, 273–304.
    Google ScholarLocate open access versionFindings
  • Deisenroth, M. P., Peters, J., Rasmussen, C. E., June 2008a. Approximate Dynamic Programming with Gaussian Processes. In: Proceedings of the 2008 American Control Conference. Seattle, WA, USA, pp. 4480–4485.
    Google ScholarLocate open access versionFindings
  • Deisenroth, M. P., Rasmussen, C. E., Peters, J., April 2008b. Model-Based Reinforcement Learning with Continuous States and Actions. In: Proceedings of the 16th European Symposium on Artificial Neural Networks. Bruges, Belgium, pp. 19–24.
    Google ScholarLocate open access versionFindings
  • Doya, K., January 2000. Reinforcement Learning in Continuous Time and Space. Neural Computation 12 (1), 219–245.
    Google ScholarLocate open access versionFindings
  • Engel, Y., Mannor, S., Meir, R., August 2003. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning. In: Proceedings of the 20th International Conference on Machine Learning. Vol. 20.
    Google ScholarLocate open access versionFindings
  • Engel, Y., Mannor, S., Meir, R., August 2005. Reinforcement Learning with Gaussian Processes. In: Proceedings of the 22nd International Conference on Machine Learning. Vol. 22.
    Google ScholarLocate open access versionFindings
  • Ernst, D., Geurts, P., Wehenkel, L., 2005. Tree-Based Batch Mode Reinforcement Learning. Journal of Machine Learning Research 6, 503–556.
    Google ScholarLocate open access versionFindings
  • Ghavamzadeh, M., Engel, Y., 2007. Bayesian Policy Gradient Algorithms. In: Scholkopf, B., Platt, J. C., Hoffman, T. (Eds.), Advances in Neural Information Processing Systems 19. The MIT Press, Cambridge, MA, USA, pp. 457–464.
    Google ScholarLocate open access versionFindings
  • Girard, A., Rasmussen, C. E., Quinonero Candela, J., Murray-Smith, R., 2003. Gaussian Process Priors with Uncertain Inputs—Application to Multiple-Step Ahead Time Series Forecasting. In: Becker, S., Thrun, S., Obermayer, K. (Eds.), Advances in Neural Information Processing Systems 15. The MIT Press, Cambridge, MA, USA, pp. 529–536.
    Google ScholarLocate open access versionFindings
  • Gordon, G. J., 1995. Stable Function Approximation in Dynamic Programming. In: Prieditis, A., Russell, S. (Eds.), Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, USA, pp. 261–268.
    Google ScholarLocate open access versionFindings
  • Howard, R. A., 1960. Dynamic Programming and Markov Processes. The MIT Press, Cambridge, MA, USA.
    Google ScholarFindings
  • Jacobs, R. A., Jordan, M. I., Nowlan, S. J., Hinton, G. E., 1991. Adaptive Mixtures of Local Experts. Neural Computation 3, 79–87.
    Google ScholarLocate open access versionFindings
  • Julier, S. J., Uhlmann, J. K., March 2004. Unscented Filtering and Nonlinear Estimation. IEEE Review 92 (3), 401–422.
    Google ScholarLocate open access versionFindings
  • Kalman, R. E., 1960. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME — Journal of Basic Engineering 82 (Series D), 35–45.
    Google ScholarLocate open access versionFindings
  • Ko, J., Fox, D., September 2008. GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction and Observation Models. In: Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Nice, France, pp. 3471–3476.
    Google ScholarLocate open access versionFindings
  • Ko, J., Klein, D. J., Fox, D., Haehnel, D., April 2007a. Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp. In: Proceedings of the International Conference on Robotics and Automation. Rome, Italy, pp. 742–747.
    Google ScholarLocate open access versionFindings
  • Ko, J., Klein, D. J., Fox, D., Haehnel, D., October 2007b. GP-UKF: Unscented Kalman Filters with Gaussian Process Prediction and Observation Models. In: Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems. San Diego, CA, USA, pp. 1901–1907.
    Google ScholarLocate open access versionFindings
  • Kocijan, J., Murray-Smith, R., Rasmussen, C. E., Likar, B., September 2003. Predictive Control with Gaussian Process Models. In: Zajc, B., Tkalcic, M. (Eds.), Proceedings of IEEE Region 8 Eurocon 2003: Computer as a Tool. Piscataway, NJ, USA, pp. 352–356.
    Google ScholarLocate open access versionFindings
  • Krause, A., Singh, A., Guestrin, C., February 2008. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies. Journal of Machine Learning Research 9, 235–284.
    Google ScholarLocate open access versionFindings
  • Kuss, M., February 2006. Gaussian Process Models for Robust Regression, Classification, and Reinforcement Learning. Ph.D. thesis, Technische Universitat Darmstadt, Germany.
    Google ScholarFindings
  • MacKay, D. J. C., 1992. Information-Based Objective Functions for Active Data Selection. Neural Computation 4, 590–604.
    Google ScholarLocate open access versionFindings
  • MacKay, D. J. C., 1999. Comparison of Approximate Methods for Handling Hyperparameters. Neural Computation 11 (5), 1035–1068.
    Google ScholarLocate open access versionFindings
  • MacKay, D. J. C., 2003. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, The Edinburgh Building, Cambridge CB2 2RU, UK.
    Google ScholarFindings
  • Martinez-Cantin, R., de Freitas, N., Doucet, A., Castellanos, J., June 2007. Active Policy Learning for Robot Planning and Exploration under Uncertainty. In: Proceedings of Robotics: Science and Systems III. Atlanta, GA, USA.
    Google ScholarLocate open access versionFindings
  • Matheron, G., 1973. The Intrinsic Random Functions and Their Applications. Advances in Applied Probability 5, 439–468.
    Google ScholarLocate open access versionFindings
  • Minka, T. P., January 2001. A Family of Algorithms for Approximate Bayesian Inference. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, USA.
    Google ScholarFindings
  • Murray-Smith, R., Sbarbaro, D., July 2002. Nonlinear Adaptive Control Using NonParametric Gaussian Process Prior Models. In: Proceedings of the 15th IFAC World Congress. Vol. 15. Academic Press, Barcelona, Spain.
    Google ScholarLocate open access versionFindings
  • Murray-Smith, R., Sbarbaro, D., Rasmussen, C. E., Girard, A., August 2003.
    Google ScholarFindings
  • O’Hagan, A., 1991. Bayes-Hermite Quadrature. Journal of Statistical Planning and Inference 29, 245–260.
    Google ScholarLocate open access versionFindings
  • Ormoneit, D., Sen, S., November 2002. Kernel-Based Reinforcement Learning. Machine Learning 49 (2–3), 161–178.
    Google ScholarLocate open access versionFindings
  • Peters, J., Schaal, S., 2008a. Learning to Control in Operational Space. The International Journal of Robotics Research 27 (2), 197–212.
    Google ScholarLocate open access versionFindings
  • Peters, J., Schaal, S., 2008b. Natural Actor-Critic. Neurocomputing 71 (7–9), 1180– 1190.
    Google ScholarLocate open access versionFindings
  • Peters, J., Schaal, S., 2008c. Reinforcement Learning of Motor Skills with Policy Gradients. Neural Networks 21, 682–697.
    Google ScholarLocate open access versionFindings
  • Pfingsten, T., September 2006. Bayesian Active Learning for Sensitivity Analysis. In: Proceedings of the 17th European Conference on Machine Learning. pp. 353–364.
    Google ScholarLocate open access versionFindings
  • Quinonero-Candela, J., Rasmussen, C. E., 2005. A Unifying View of Sparse Approximate Gaussian Process Regression. Journal of Machine Learning Research 6 (2), 1939–1960.
    Google ScholarLocate open access versionFindings
  • Rasmussen, C. E., 1996. Evaluation of Gaussian Processes and other Methods for Non-linear Regression. Ph.D. thesis, Department of Computer Science, University of Toronto.
    Google ScholarFindings
  • Rasmussen, C. E., Deisenroth, M. P., November 2008. Probabilistic Inference for Fast Learning in Control. In: Girgin, S., Loth, M, Munos, R., Preux, P., Ryabko, D. (Eds.), Recent Advances in Reinforcement Learning. Vol. 5323 of Lecture Notes on Computer Science. Springer-Verlag, pp. 229–242.
    Google ScholarLocate open access versionFindings
  • Rasmussen, C. E., Ghahramani, Z., 2003. Bayesian Monte Carlo. In: Becker, S., Thrun, S., Obermayer, K. (Eds.), Advances in Neural Information Processing Systems 15. The MIT Press, Cambridge, MA, USA, pp. 489–496.
    Google ScholarLocate open access versionFindings
  • Rasmussen, C. E., Kuss, M., June 2004. Gaussian Processes in Reinforcement Learning. In: Thrun, S., Saul, L. K., Scholkopf, B. (Eds.), Advances in Neural Information Processing Systems 16. The MIT Press, Cambridge, MA, USA, pp. 751–759.
    Google ScholarLocate open access versionFindings
  • Rasmussen, C. E., Williams, C. K. I., 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. The MIT Press, Cambridge, MA, USA. URL http://www.gaussianprocess.org/gpml/
    Findings
  • Riedmiller, M., 2000. Concepts and Facilities of a Neural Reinforcement Learning Control Architecture for Technical Process Control. Neural Computation and Application 8, 323–338.
    Google ScholarLocate open access versionFindings
  • Riedmiller, M., 2005. Neural Fitted Q Iteration—First Experiences with a Data
    Google ScholarFindings
  • Efficient Neural Reinforcement Learning Method. In: Proceedings of the 16th European Conference on Machine Learning. Porto, Portugal.
    Google ScholarLocate open access versionFindings
  • Riedmiller, M., Braun, H., 1993. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. In: In Proceedings of the IEEE International Conference on Neural Networks. pp. 586–591.
    Google ScholarLocate open access versionFindings
  • Snelson, E., Ghahramani, Z., 2006. Sparse Gaussian Processes using Pseudo-inputs. In: Weiss, Y., Scholkopf, B., Platt, J. C. (Eds.), Advances in Neural Information Processing Systems 18. The MIT Press, Cambridge, MA, USA, pp. 1257–1264.
    Google ScholarLocate open access versionFindings
  • Sutton, R. S., Barto, A. G., 1998. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. The MIT Press, Cambridge, MA, USA. Verdinelli, I., Kadane, J. B., June 1992. Bayesian Designs for Maximizing Information and Outcome. Journal of the American Statistical Association 87 (418), 510–515.
    Google ScholarLocate open access versionFindings
  • Wasserman, L., 2006. All of Nonparametric Statistics. Springer Texts in Statistics. Springer Science+Business Media, Inc., New York, NY, USA. Williams, C. K. I., Rasmussen, C. E., 1996. Gaussian Processes for Regression. In: Touretzky, D. S., Mozer, M. C., Hasselmo, M. E. (Eds.), Advances in Neural Processing Systems 8. The MIT Press, Cambridge, MA, USA, pp. 598–604.
    Google ScholarLocate open access versionFindings
  • Marc Peter Deisenroth is a Ph.D. candidate at Universitat Karlsruhe (TH), Germany, while being visiting graduate student at the Computational and Biological Learning Lab at the Department of Engineering, University of Cambridge, UK. He graduated from Universitat Karlsruhe (TH) in August 2006 with a German Masters degree in Informatics. From October 2006 to September 2007, he has been a graduate research assistant at the Max Planck Institute for Biological Cybernetics in Tubingen, Germany. He has been a visiting researcher at Osaka University, Japan, in 2006 and at Kanazawa University, Japan, in 2004. His research interests include Bayesian inference, reinforcement learning, optimal and nonlinear control.
    Google ScholarLocate open access versionFindings
  • Carl Edward Rasmussen is a lecturer in the Computational and Biological Learning Lab at the Department of Engineering, University of Cambridge and an adjunct research scientist at the Max Planck Institute for Biological Cybernetics, Tubingen, Germany. His main research interests are Bayesian inference and machine learning. He received his Masters in Engineering from the Technical University of Denmark and his Ph.D. in Computer Science from the University of Toronto in 1996. Since then he has been a post doc at the Technical University of Denmark, a senior research fellow at the Gatsby Computational Neuroscience Unit at University College London from 2000–2002, and a junior research group leader at the Max Planck Institute for Biological Cybernetics in Tubingen, Germany, from 2002–2007.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科