AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We presented HyperDynamics, a framework that conditions on system features inferred from observations and interactions with the environment to generate parameters for dynamics models dedicated to the observed system on the fly

HyperDynamics: Generating Expert Dynamics Models by Observation

international conference on learning representations, (2021)

Cited by: 0|Views43
Full Text
Bibtex
Weibo

Abstract

We propose HyperDynamics, a framework that conditions on an agent’s interactions with the environment and optionally its visual observations, and generates the parameters of neural dynamics models based on inferred properties of the dynamical system. Physical and visual properties of the environment that are not part of the low-dimensiona...More

Code:

Data:

0
Introduction
  • Humans learn dynamics models that predict results of their interactions with the environment, and use such predictions for selecting actions to achieve intended goals (Miall & Wolpert, 1996; Haruno et al, 1999).
  • The weights of the target network are not model parameters; during training, their gradients are backpropagated to the weights of the hypernetwork (Chang et al, 2019)
  • This idea is initially proposed in (Ha et al, 2016), where the authors demonstrate leveraging the structural features of the original network to generate network layers, and achieve model compression while maintaining competitive performance for both image recognition and language modelling tasks.
  • To the best of the knowledge, this is the first work that applies the idea of generating model parameters to the domain of model-based RL, and proposes to generate expert models conditioned on system properties
Highlights
  • Humans learn dynamics models that predict results of their interactions with the environment, and use such predictions for selecting actions to achieve intended goals (Miall & Wolpert, 1996; Haruno et al, 1999)
  • Our experiments aim to answer these questions: (1) Is HyperDynamics able to generate dynamics models across environment variations that perform as well as expert dynamics models, which are trained on each environment variation? (2) Does HyperDynamics generalize to systems with novel properties? (3) How does HyperDynamics compare with methods that either use fixed global dynamics models or adapt their parameters during the course of interaction through metaoptimization? We test our proposed framework in the tasks of both object pushing and locomotion, and describe each of them in details below
  • We presented HyperDynamics, a framework that conditions on system features inferred from observations and interactions with the environment to generate parameters for dynamics models dedicated to the observed system on the fly
  • We evaluate our framework in the context of object pushing and locomotion
  • Our experimental evaluations show that dynamics models generated by HyperDynamics perform on par with an ensemble of directly trained experts in the training environments, while other baselines fail to do so
  • Our framework is able to transfer knowledge acquired from seen systems to novel systems with unseen properties, even when only 24 distinct objects are used for training in the object pushing task
Methods
  • Many prior works that learn object dynamics consider only quasi-static pushing or poking, where an object always starts to move or stops together with the robot’s end-effector (Finn & Levine, 2016; Agrawal et al, 2016; Li et al, 2018; Pinto & Gupta, 2017).
  • The authors go beyond simple quasi-static pushing by varying the physics parameters of the object and the scene, and allow an object to slide by itself if pushed with a high speed.
  • The authors test HyperDynamics on its motion prediction accuracy for single- and multi-step object pushing, as well as its performance when used for pushing objects to desired locations with MPC
Conclusion
  • The authors presented HyperDynamics, a framework that conditions on system features inferred from observations and interactions with the environment to generate parameters for dynamics models dedicated to the observed system on the fly.
  • The authors' experimental evaluations show that dynamics models generated by HyperDynamics perform on par with an ensemble of directly trained experts in the training environments, while other baselines fail to do so.
  • Handling more varied visual tasks, predicting both the architecture and the parameters of the target dynamics model, and applying the method in real-world scenarios are interesting avenues for future work.
  • The authors' model should be trainable with data collected in real-world, following the pipeline described in (Nagabandi et al, 2019)
Tables
  • Table1: Motion prediction error (in centimeters)
  • Table2: Pushing success rate
  • Table3: Comparison of average total return for locomotion tasks
  • Table4: Prediction error of HyperDynamics for pushing with different model ablations
  • Table5: Average total return of HyperDynamics for locomotion with different model ablations
  • Table6: Prediction error of Direct with varying model capacity
  • Table7: Comparison of prediction error (×10−2) for locomotion tasks
Download tables as Excel
Related work
  • 2.1 MODEL LEARNING AND MODEL ADAPTATION

    To place HyperDynamics in the context of existing literature on model learning and adaptation, we distinguish four groups of methods, depending on whether they explicitly distinguish between dynamic—such as joint configurations of an agent or poses of objects being manipulated—and static —such as mass, inertia, 3D object shape—properties of the system, and whether they update the dynamics model parameters or the static property values of the system.

    (i) No adaptation. These methods concern dynamics model over either low-dimensional states or high-dimensional visual inputs of a specific system, without considering potential changes in the underlying dynamics. These models tend to fail when the system behavior changes. Such expert dynamics models are popular formulations in the literature (Watter et al, 2015; Banijamali et al, 2018; Fragkiadaki et al, 2015; Zhang et al, 2019). They can be adapted through gradient descent at any test system, yet that would require a large number of interactions.

    (ii) Visual dynamics and recurrent state representations. These methods operate over highdimensional visual observations of systems with a variety of properties (Finn & Levine, 2016; Ebert et al, 2018; Li et al, 2018; Xu et al, 2019b; Mathieu et al, 2015; Oh et al, 2015; Pathak et al, 2017), hoping the visual inputs could capture both the state and properties of the system. Some also attempt to encode a history of interactions with a recurrent hidden state (Xu et al, 2019a; Li et al, 2018; Ebert et al, 2018; Sanchez-Gonzalez et al, 2018b; Hafner et al, 2019), in order to implicitly capture information regarding the physical properties of the system. These methods use a single and fixed global dynamics model that takes system properties as input directly, together with its state and action.
Study subjects and analysis
data samples: 5
Performance gain of our model over Direct suggests our hierarchical way of conditioning on system features and generating experts outperforms the common framework in the literature, which uses a global model with fixed parameters for varying system dynamics. Our model also shows a clear performance gain over MB-MAML, when only 5 data samples of interaction are available for online adaptation. This indicates that our framework, which directly generates dynamics model given the system properties, is more sample-efficient than the meta-trained dynamics model prior that needs to adapt to the context via extra tuning

Reference
  • Pulkit Agrawal, Ashvin V Nair, Pieter Abbeel, Jitendra Malik, and Sergey Levine. Learning to poke by poking: Experiential learning of intuitive physics. In Advances in neural information processing systems, pp. 5074–5082, 2016.
    Google ScholarLocate open access versionFindings
  • Anurag Ajay, Maria Bauza, Jiajun Wu, Nima Fazeli, Joshua Tenenbaum, Alberto Rodriguez, and Leslie Kaelbling. Combining physical simulators and object-based networks for control, 04 2019.
    Google ScholarFindings
  • OpenAI. Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
    Google ScholarLocate open access versionFindings
  • Ershad Banijamali, Rui Shu, Hung Bui, Ali Ghodsi, et al. Robust locally-linear controllable embedding. In International Conference on Artificial Intelligence and Statistics, pp. 1751–1759, 2018.
    Google ScholarLocate open access versionFindings
  • Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks for learning about objects, relations and physics. In Advances in neural information processing systems, pp. 4502–4510, 2016.
    Google ScholarLocate open access versionFindings
  • Kiran S. Bhat, Steven M. Seitz, and Jovan Popovic. Computing the physical parameters of rigidbody motion from video. In Anders Heyden, Gunnar Sparr, Mads Nielsen, and Peter Johansen (eds.), ECCV, volume 2350 of Lecture Notes in Computer Science, pp. 551–565.
    Google ScholarLocate open access versionFindings
  • Springer, 2002. ISBN 3-540-43745-2. URL http://dblp.uni-trier.de/db/conf/eccv/eccv2002-1.html#BhatSP02.
    Findings
  • Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344, 2017.
    Findings
  • Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015.
    Findings
  • Oscar Chang, Lampros Flokas, and Hod Lipson. Principled weight initialization for hypernetworks. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Tao Chen, Adithyavairavan Murali, and Abhinav Gupta. Hardware conditioned policies for multirobot transfer learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 9355–9366. Curran Associates Inc., 2018.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart Van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
    Findings
  • Ignasi Clavera, Anusha Nagabandi, Ronald Fearing, Pieter Abbeel, Sergey Levine, and Chelsea Finn. Learning to adapt: Meta-learning for model-based control. 03 2018.
    Google ScholarLocate open access versionFindings
  • Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2019.
    Findings
  • Frederik Ebert, Chelsea Finn, Sudeep Dasari, Annie Xie, Alex Lee, and Sergey Levine. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. arXiv preprint arXiv:1812.00568, 2018.
    Findings
  • Chelsea Finn and Sergey Levine. Deep visual foresight for planning robot motion. CoRR, abs/1610.00696, 20URL http://arxiv.org/abs/1610.00696.
    Findings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. CoRR, abs/1703.03400, 20URL http://arxiv.org/abs/1703.03400.
    Findings
  • Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, and Jitendra Malik. Learning visual predictive models of physics for playing billiards. CoRR, abs/1511.07404, 2015.
    Findings
  • David Ha, Andrew Dai, and Quoc V Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016.
    Findings
  • Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In International Conference on Machine Learning, pp. 2555–2565, 2019.
    Google ScholarLocate open access versionFindings
  • Jessica Hamrick, Peter Battaglia, and Joshua B Tenenbaum. Internal physics models guide probabilistic judgments about object dynamics. In Proceedings of the 33rd annual conference of the cognitive science society, pp. 1545–1550. Cognitive Science Society Austin, TX, 2011.
    Google ScholarLocate open access versionFindings
  • Masahiko Haruno, Daniel M Wolpert, and Mitsuo Kawato. Multiple paired forward-inverse models for human motor learning and control. In M. J. Kearns, S. A. Solla, and D. A. Cohn (eds.), Advances in Neural Information Processing Systems 11, pp. 31–37. MIT Press, 1999.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross B. Girshick. Mask R-CNN. CoRR, abs/1703.06870, 2017. URL http://arxiv.org/abs/1703.06870.
    Findings
  • Jue Kun Li, Wee Sun Lee, and David Hsu. Push-net: Deep planar pushing for objects with unknown physical properties. In Robotics: Science and Systems, 2018.
    Google ScholarLocate open access versionFindings
  • Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B. Tenenbaum, and Antonio Torralba. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJgbSn09Ym.
    Locate open access versionFindings
  • Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3296–3305, 2019.
    Google ScholarLocate open access versionFindings
  • Michael Mathieu, Camille Couprie, and Yann LeCun. Deep multi-scale video prediction beyond mean square error. CoRR, abs/1511.05440, 2015. URL http://arxiv.org/abs/1511.05440.
    Findings
  • Elliot Meyerson and Risto Miikkulainen. Modular universal reparameterization: Deep multi-task learning across diverse domains. In Advances in Neural Information Processing Systems, pp. 7903–7914, 2019.
    Google ScholarLocate open access versionFindings
  • R. C. Miall and D. M. Wolpert. Forward models for physiological motor control. Neural Netw., 9 (8):1265–1279, November 1996. ISSN 0893-6080. doi: 10.1016/S0893-6080(96)00035-4. URL http://dx.doi.org/10.1016/S0893-6080(96)00035-4.
    Locate open access versionFindings
  • Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Joshua B Tenenbaum, and Daniel LK Yamins. Flexible neural representation for physics prediction. In Advances in Neural Information Processing Systems, 2018.
    Google ScholarLocate open access versionFindings
  • Anusha Nagabandi, Chelsea Finn, and Sergey Levine. Deep online learning via meta-learning: Continual adaptation for model-based rl, 12 2018a.
    Google ScholarFindings
  • Anusha Nagabandi, Gregory Kahn, Ronald S Fearing, and Sergey Levine. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566. IEEE, 2018b.
    Google ScholarLocate open access versionFindings
  • Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, and Chelsea Finn. Learning to adapt in dynamic, real-world environments through metareinforcement learning. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, and Satinder Singh. Action-conditional video prediction using deep networks in atari games. arXiv preprint arXiv:1507.08750, 2015.
    Findings
  • Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. CoRR, abs/1705.05363, 2017. URL http://arxiv.org/abs/1705.05363.
    Findings
  • Lerrel Pinto and Abhinav Gupta. Learning to push by grasping: Using multiple tasks for effective learning. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2161– 2168. IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • Emmanouil Antonios Platanios, Mrinmaya Sachan, Graham Neubig, and Tom Mitchell. Contextual parameter generation for universal neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 425–435, 2018.
    Google ScholarLocate open access versionFindings
  • Neale Ratzlaff and Li Fuxin. Hypergan: A generative model for diverse, performant neural networks. In International Conference on Machine Learning, pp. 5361–5369, 2019.
    Google ScholarLocate open access versionFindings
  • Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, and Peter Battaglia. Graph networks as learnable physics engines for inference and control. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 4470–4479, Stockholmsmassan, Stockholm Sweden, 10–15 Jul 2018a. PMLR.
    Google ScholarLocate open access versionFindings
  • Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, and Peter Battaglia. Graph networks as learnable physics engines for inference and control. In International Conference on Machine Learning, pp. 4470–4479, 2018b.
    Google ScholarLocate open access versionFindings
  • Joan Serra, Santiago Pascual, and Carlos Segura Perales. Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion. Advances in Neural Information Processing Systems, 32:6793–6803, 2019.
    Google ScholarLocate open access versionFindings
  • Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • Hsiao-Yu Fish Tung, Ricson Cheng, and Katerina Fragkiadaki. Learning spatial common sense with geometry-aware recurrent networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2595–2603, 2019.
    Google ScholarLocate open access versionFindings
  • Hsiao-Yu Fish Tung, Zhou Xian, Mihir Prabhudesai, Shamit Lal, and Katerina Fragkiadaki. 3d-oes: Viewpoint-invariant object-factorized environment simulators. arXiv preprint arXiv:2011.06464, 2020.
    Findings
  • Johannes von Oswald, Christian Henning, Joao Sacramento, and Benjamin F Grewe. Continual learning with hypernetworks. arXiv preprint arXiv:1906.00695, 2019.
    Findings
  • Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in neural information processing systems, pp. 2746–2754, 2015.
    Google ScholarLocate open access versionFindings
  • Jiajun Wu, Erika Lu, Pushmeet Kohli, Bill Freeman, and Josh Tenenbaum. Learning to see physics via visual de-animation. In Advances in Neural Information Processing Systems, pp. 153–164, 2017.
    Google ScholarLocate open access versionFindings
  • Zhenjia Xu, Jiajun Wu, Andy Zeng, Joshua B. Tenenbaum, and Shuran Song. Densephysnet: Learning dense physical object representations via multi-step dynamic interactions. CoRR, abs/1906.03853, 2019a. URL http://arxiv.org/abs/1906.03853.
    Findings
  • Zhenjia Xu, Jiajun Wu, Andy Zeng, Joshua B Tenenbaum, and Shuran Song. Densephysnet: Learning dense physical object representations via multi-step dynamic interactions. arXiv preprint arXiv:1906.03853, 2019b.
    Findings
  • Under review as a conference paper at ICLR 2021 Kuan-Ting Yu, Maria Bauza, Nima Fazeli, and Alberto Rodriguez. More than a million ways to be pushed: A high-fidelity experimental data set of planar pushing. CoRR, abs/1604.04038, 2016. URL http://arxiv.org/abs/1604.04038. Chris Zhang, Mengye Ren, and Raquel Urtasun. Graph hypernetworks for neural architecture search.arXiv preprint arXiv:1810.05749, 2018. Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew Johnson, and Sergey Levine. Solar: Deep structured representations for model-based reinforcement learning. In International Conference on Machine Learning, pp.7444–7453, 2019.
    Findings
Author
Zhou Xian
Zhou Xian
Shamit Lal
Shamit Lal
Emmanouil Antonios Platanios
Emmanouil Antonios Platanios
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科