Action-Conditional Video Prediction using Deep Networks in Atari Games

Annual Conference on Neural Information Processing Systems, (2015): 2863-2871

被引用728|浏览191
EI
下载 PDF 全文
引用
微博一下

摘要

Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future image-frames depend on control variables or actions as well as previous frames. While not composed of natural scenes, frames in...更多

代码

数据

0
简介
  • Deep learning approaches have shown great success in many visual perception problems (e.g., [16, 7, 32, 9]).
  • The authors focus on Atari games from the Arcade Learning Environment (ALE) [1] as a source of challenging action-conditional video modeling problems.
  • While not composed of natural scenes, frames in Atari games are high-dimensional, can involve tens of objects with one or more objects being controlled by the actions directly and many other objects being influenced indirectly, can involve entry and departure of objects, and can involve deep partial observability.
  • To the best of the knowledge, this paper is the first to make and evaluate long-term predictions on high-dimensional images conditioned by control inputs
重点内容
  • Over the years, deep learning approaches have shown great success in many visual perception problems (e.g., [16, 7, 32, 9])
  • In vision-based reinforcement learning (RL) problems, learning to predict future images conditioned on actions amounts to learning a model of the dynamics of the agent-environment interaction, an essential component of model-based approaches to reinforcement learning
  • We focus on Atari games from the Arcade Learning Environment (ALE) [1] as a source of challenging action-conditional video modeling problems
  • While not composed of natural scenes, frames in Atari games are high-dimensional, can involve tens of objects with one or more objects being controlled by the actions directly and many other objects being influenced indirectly, can involve entry and departure of objects, and can involve deep partial observability
  • An example of long-term predictions is illustrated in Figure 2
  • This paper introduced two different novel deep architectures that predict future frames that are dependent on actions and showed qualitatively and quantitatively that they are able to predict visuallyrealistic and useful-for-control frames over 100-step futures on several Atari game domains
方法
  • In the experiments that follow, the authors have the following goals for the two architectures. 1) To evaluate the predicted frames in two ways: qualitatively evaluating the generated video, and quantitatively evaluating the pixel-based squared error, 2) To evaluate the usefulness of predicted frames for control in two ways: by replacing the emulator’s frames with predicted frames for use by DQN, and by using the predictions to improve exploration in DQN, and 3) To analyze the representations learned by the architectures.
  • Data and Preprocessing.
  • The authors used the replication of DQN to generate game-play video datasets using an -greedy policy with = 0.3, i.e. DQN is forced to choose a random action with 30% probability.
  • The dataset consists of about 500, 000 training frames and 50, 000 test frames with actions chosen by DQN.
  • Following DQN, actions are chosen once every 4 frames which reduces the video from 60fps to 15fps.
  • The authors used full-resolution RGB images (210 × 160) and preprocessed the images by subtracting mean pixel values and dividing each pixel value by 255
结果
  • Evaluation of Predicted Frames

    Qualitative Evaluation: Prediction video. The prediction videos of the models and baselines are available in the supplementary material and at the following website: https://sites.google. com/a/umich.edu/junhyuk-oh/action-conditional-video-prediction.
  • As seen in the videos, the proposed models make qualitatively reasonable predictions over 30–500 steps depending on the game.
  • An example of long-term predictions is illustrated in Figure 2.
  • The authors observed that both of the models predict complex local translations well such as the movement of vehicles and the controlled object.
  • In Figure 2, the model predicts the sudden change of the location of the controlled object at 257-step
结论
  • This paper introduced two different novel deep architectures that predict future frames that are dependent on actions and showed qualitatively and quantitatively that they are able to predict visuallyrealistic and useful-for-control frames over 100-step futures on several Atari game domains.
  • Since the architectures were domain independent the authors expect that they will generalize to many vision-based RL problems.
表格
  • Table1: Average game score of DQN over 100 plays with standard error. The first row and the second row show the performance of our DQN replication with different exploration strategies
Download tables as Excel
相关工作
  • Video Prediction using Deep Networks. The problem of video prediction has led to a variety of architectures in deep learning. A recurrent temporal restricted Boltzmann machine (RTRBM) [29] was proposed to learn temporal correlations from sequential data by introducing recurrent connections in RBM. A structured RTRBM (sRTRBM) [20] scaled up RTRBM by learning dependency structures between observations and hidden variables from data. More recently, Michalski et al [19] proposed a higher-order gated autoencoder that defines multiplicative interactions between consecutive frames and mapping units, and showed that temporal prediction problem can be viewed as learning and inferring higher-order interactions between consecutive images. Srivastava et al [28] applied a sequence-to-sequence learning framework [31] to a video domain, and showed that long short-term memory (LSTM) [12] networks are capable of generating video of bouncing handwritten digits. In contrast to these previous studies, this paper tackles problems where control variables affect temporal dynamics, and in addition scales up spatio-temporal prediction to larger-size images.
基金
  • This work was supported by NSF grant IIS-1526059, Bosch Research, and ONR grant N00014-13-1-0762
引用论文
  • M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
    Google ScholarLocate open access versionFindings
  • M. G. Bellemare, J. Veness, and M. Bowling. Investigating contingency awareness using Atari 2600 games. In AAAI, 2012.
    Google ScholarLocate open access versionFindings
  • M. G. Bellemare, J. Veness, and M. Bowling. Bayesian learning of recursively factored environments. In ICML, 2013.
    Google ScholarFindings
  • M. G. Bellemare, J. Veness, and E. Talvitie. Skip context tree switching. In ICML, 2014.
    Google ScholarLocate open access versionFindings
  • Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1–127, 2009.
    Google ScholarLocate open access versionFindings
  • Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, 2009.
    Google ScholarLocate open access versionFindings
  • D. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In CVPR, 2012.
    Google ScholarLocate open access versionFindings
  • A. Dosovitskiy, J. T. Springenberg, and T. Brox. Learning to generate chairs with convolutional neural networks. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • A. Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
    Findings
  • X. Guo, S. Singh, H. Lee, R. L. Lewis, and X. Wang. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM Multimedia, 2014.
    Google ScholarLocate open access versionFindings
  • A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • L. Kocsis and C. Szepesvari. Bandit based Monte-Carlo planning. In ECML. 2006.
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
    Google ScholarLocate open access versionFindings
  • I. Lenz, R. Knepper, and A. Saxena. DeepMPC: Learning deep latent features for model predictive control. In RSS, 2015.
    Google ScholarLocate open access versionFindings
  • R. Memisevic. Learning to relate images. IEEE TPAMI, 35(8):1829–1846, 2013.
    Google ScholarLocate open access versionFindings
  • V. Michalski, R. Memisevic, and K. Konda. Modeling deep temporal dependencies with recurrent grammar cells. In NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • R. Mittelman, B. Kuipers, S. Savarese, and H. Lee. Structured recurrent temporal restricted Boltzmann machines. In ICML, 2014.
    Google ScholarLocate open access versionFindings
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
    Findings
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
    Google ScholarLocate open access versionFindings
  • V. Nair and G. E. Hinton. Rectified linear units improve restricted Boltzmann machines. In ICML, 2010.
    Google ScholarLocate open access versionFindings
  • S. Reed, K. Sohn, Y. Zhang, and H. Lee. Learning to disentangle factors of variation with manifold interaction. In ICML, 2014.
    Google ScholarLocate open access versionFindings
  • S. Rifai, Y. Bengio, A. Courville, P. Vincent, and M. Mirza. Disentangling factors of variation for facial expression recognition. In ECCV. 2012.
    Google ScholarFindings
  • J. Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85–117, 2015.
    Google ScholarLocate open access versionFindings
  • J. Schmidhuber and R. Huber. Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems, 2:125–134, 1991.
    Google ScholarLocate open access versionFindings
  • N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using LSTMs. In ICML, 2015.
    Google ScholarLocate open access versionFindings
  • I. Sutskever, G. E. Hinton, and G. W. Taylor. The recurrent temporal restricted Boltzmann machine. In NIPS, 2009.
    Google ScholarLocate open access versionFindings
  • I. Sutskever, J. Martens, and G. E. Hinton. Generating text with recurrent neural networks. In ICML, 2011.
    Google ScholarLocate open access versionFindings
  • I. Sutskever, O. Vinyals, and Q. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.
    Findings
  • G. W. Taylor and G. E. Hinton. Factored conditional restricted Boltzmann machines for modeling motion style. In ICML, 2009.
    Google ScholarLocate open access versionFindings
  • T. Tieleman and G. Hinton. Lecture 6.5 - RMSProp: Divde the gradient by a running average of its recent magnitude. Coursera, 2012.
    Google ScholarLocate open access versionFindings
  • D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3D convolutional networks. In ICCV, 2015.
    Google ScholarLocate open access versionFindings
  • C. J. Watkins and P. Dayan. Q-learning. Machine learning, 8(3-4):279–292, 1992.
    Google ScholarLocate open access versionFindings
  • J. Yang, S. Reed, M.-H. Yang, and H. Lee. Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. In NIPS, 2015.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科