AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
This paper introduced the Deep Recurrent Attentive Writer neural network architecture, and demonstrated its ability to generate highly realistic natural images such as photographs of house numbers, as well as improving on the best known results for binarized MNIST generation

DRAW: A Recurrent Neural Network For Image Generation.

International Conference on Machine Learning, (2015): 1462-1471

被引用1745|浏览453
EI
下载 PDF 全文
引用
微博一下

摘要

This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational auto-encoding framework that allows for the iterative construction of complex images. The s...更多

代码

数据

0
简介
  • A person asked to draw, paint or otherwise recreate a visual scene will naturally do so in a sequential, iterative fashion, reassessing their handiwork after each modification.
  • As well as precluding the possibility of iterative self-correction, the “one shot” approach is fundamentally difficult to scale to large images.
  • The Deep Recurrent Attentive Writer (DRAW) architecture represents a shift towards a more natural form of image construction, in which parts of a scene are created independently from others, and approximate sketches are successively refined
重点内容
  • A person asked to draw, paint or otherwise recreate a visual scene will naturally do so in a sequential, iterative fashion, reassessing their handiwork after each modification
  • The images generated by the network are always novel, and are virtually indistinguishable from real data for MNIST and Street View House Numbers; the generated CIFAR images are somewhat blurry, but still contain recognisable structure from natural scenes
  • We evaluate the 2D attention module of the Deep Recurrent Attentive Writer network on cluttered MNIST classification
  • For the Street View House Numbers and CIFAR-10 experiments, the red, green and blue pixel intensities were represented as numbers between 0 and 1, which were interpreted as independent colour emission probabilities
  • This paper introduced the Deep Recurrent Attentive Writer (DRAW) neural network architecture, and demonstrated its ability to generate highly realistic natural images such as photographs of house numbers, as well as improving on the best known results for binarized MNIST generation
  • We established that the two-dimensional differentiable attention mechanism embedded in Deep Recurrent Attentive Writer is beneficial not only to image generation, but to image classification
结果
  • The authors assess the ability of DRAW to generate realisticlooking images by training on three datasets of progressively increasing visual complexity: MNIST (LeCun et al, 1998), Street View House Numbers (SVHN) (Netzer et al, 2011) and CIFAR-10 (Krizhevsky, 2009).
  • For the MNIST experiments, the reconstruction loss from Eq 9 was the usual binary cross-entropy term.
  • The reconstruction loss was the cross-entropy between the pixel intensities and the model probabilities.
  • This approach worked well in practice, it means that the training loss did not correspond to the true compression cost of RGB images
结论
  • This paper introduced the Deep Recurrent Attentive Writer (DRAW) neural network architecture, and demonstrated its ability to generate highly realistic natural images such as photographs of house numbers, as well as improving on the best known results for binarized MNIST generation.
  • The authors established that the two-dimensional differentiable attention mechanism embedded in DRAW is beneficial not only to image generation, but to image classification
表格
  • Table1: Classification test error on 100 × 100 Cluttered Translated MNIST
  • Table2: Negative log-likelihood (in nats) per test-set example on the binarised MNIST data set. The right hand column, where present, gives an upper bound (Eq 12) on the negative loglikelihood. The previous results are from [1] (<a class="ref-link" id="cSalakhutdinov_2009_a" href="#rSalakhutdinov_2009_a">Salakhutdinov & Hinton, 2009</a>), [2] (<a class="ref-link" id="cMurray_2009_a" href="#rMurray_2009_a">Murray & Salakhutdinov, 2009</a>), [3] (<a class="ref-link" id="cUria_et+al_2014_a" href="#rUria_et+al_2014_a">Uria et al, 2014</a>), [4] (<a class="ref-link" id="cRaiko_et+al_2014_a" href="#rRaiko_et+al_2014_a">Raiko et al, 2014</a>), [5] (<a class="ref-link" id="cRezende_et+al_2014_a" href="#rRezende_et+al_2014_a">Rezende et al, 2014</a>), [6] (<a class="ref-link" id="cSalimans_et+al_2014_a" href="#rSalimans_et+al_2014_a">Salimans et al, 2014</a>), [7] (<a class="ref-link" id="cGregor_et+al_2014_a" href="#rGregor_et+al_2014_a">Gregor et al, 2014</a>)
  • Table3: Experimental Hyper-Parameters
Download tables as Excel
基金
  • DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational auto-encoding framework that allows for the iterative construction of complex images
  • The system substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye
  • The core of the DRAW architecture is a pair of recurrent neural networks: an encoder network that compresses the real images presented during training, and a decoder that reconstitutes images after receiving codes
  • The combined system is trained end-to-end with stochastic gradient descent, where the loss function is a variational upper bound on the log-likelihood of the data
  • Where DRAW differs from its siblings is that, rather than generating images in a single pass, it iteratively constructs scenes through an accumulation of modifications emitted by the decoder, each of which is observed by the encoder
引用论文
  • Ba, Jimmy, Mnih, Volodymyr, and Kavukcuoglu, Koray. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755, 2014.
    Findings
  • Dayan, Peter, Hinton, Geoffrey E, Neal, Radford M, and Zemel, Richard S. The helmholtz machine. Neural computation, 7(5):889–904, 1995.
    Google ScholarLocate open access versionFindings
  • Denil, Misha, Bazzani, Loris, Larochelle, Hugo, and de Freitas, Nando. Learning where to attend with deep architectures for image tracking. Neural computation, 24(8):2151–2184, 2012.
    Google ScholarLocate open access versionFindings
  • Gers, Felix A, Schmidhuber, Jurgen, and Cummins, Fred. Learning to forget: Continual prediction with lstm. Neural computation, 12(10):2451–2471, 2000.
    Google ScholarLocate open access versionFindings
  • Goodfellow, Ian J, Bulatov, Yaroslav, Ibarz, Julian, Arnoud, Sacha, and Shet, Vinay. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082, 2013.
    Findings
  • Graves, Alex. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
    Findings
  • Graves, Alex, Wayne, Greg, and Danihelka, Ivo. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
    Findings
  • Gregor, Karol, Danihelka, Ivo, Mnih, Andriy, Blundell, Charles, and Wierstra, Daan. Deep autoregressive networks. In Proceedings of the 31st International Conference on Machine Learning, 2014.
    Google ScholarLocate open access versionFindings
  • Hinton, Geoffrey E and Salakhutdinov, Ruslan R. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
    Google ScholarLocate open access versionFindings
  • Hochreiter, Sepp and Schmidhuber, Jurgen. Long shortterm memory. Neural computation, 9(8):1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • Kingma, Diederik P and Welling, Max. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • Larochelle, Hugo and Murray, Iain. The neural autoregressive distribution estimator. Journal of Machine Learning Research, 15:29–37, 2011.
    Google ScholarLocate open access versionFindings
  • LeCun, Yann, Bottou, Leon, Bengio, Yoshua, and Haffner, Patrick. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278– 2324, 1998.
    Google ScholarLocate open access versionFindings
  • Mnih, Andriy and Gregor, Karol. Neural variational inference and learning in belief networks. In Proceedings of the 31st International Conference on Machine Learning, 2014.
    Google ScholarLocate open access versionFindings
  • Mnih, Volodymyr, Heess, Nicolas, Graves, Alex, et al. Recurrent models of visual attention. In Advances in Neural Information Processing Systems, pp. 2204–2212, 2014.
    Google ScholarLocate open access versionFindings
  • Murray, Iain and Salakhutdinov, Ruslan. Evaluating probabilities under high-dimensional latent variable models. In Advances in neural information processing systems, pp. 1137–1144, 2009.
    Google ScholarLocate open access versionFindings
  • Netzer, Yuval, Wang, Tao, Coates, Adam, Bissacco, Alessandro, Wu, Bo, and Ng, Andrew Y. Reading digits in natural images with unsupervised feature learning. 2011.
    Google ScholarFindings
  • Raiko, Tapani, Li, Yao, Cho, Kyunghyun, and Bengio, Yoshua. Iterative neural autoregressive distribution estimator nade-k. In Advances in Neural Information Processing Systems, pp. 325–333. 2014.
    Google ScholarLocate open access versionFindings
  • Ranzato, Marc’Aurelio. On learning where to look. arXiv preprint arXiv:1405.5488, 2014.
    Findings
  • Rezende, Danilo J, Mohamed, Shakir, and Wierstra, Daan. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pp. 1278– 1286, 2014.
    Google ScholarLocate open access versionFindings
  • Salakhutdinov, Ruslan and Hinton, Geoffrey E. Deep boltzmann machines. In International Conference on Artificial Intelligence and Statistics, pp. 448–455, 2009.
    Google ScholarLocate open access versionFindings
  • Salakhutdinov, Ruslan and Murray, Iain. On the quantitative analysis of Deep Belief Networks. In Proceedings of the 25th Annual International Conference on Machine Learning, pp. 872–879.
    Google ScholarLocate open access versionFindings
  • Omnipress, 2008.
    Google ScholarFindings
  • Krizhevsky, Alex. Learning multiple layers of features from tiny images. 2009.
    Google ScholarFindings
  • Larochelle, Hugo and Hinton, Geoffrey E. Learning to combine foveal glimpses with a third-order boltzmann machine. In Advances in Neural Information Processing Systems, pp. 1243–1251. 2010.
    Google ScholarLocate open access versionFindings
  • Salimans, Tim, Kingma, Diederik P, and Welling, Max. Markov chain monte carlo and variational inference: Bridging the gap. arXiv preprint arXiv:1410.6460, 2014.
    Findings
  • Sermanet, Pierre, Frome, Andrea, and Real, Esteban. Attention for fine-grained categorization. arXiv preprint arXiv:1412.7054, 2014.
    Findings
  • Sutskever, Ilya, Vinyals, Oriol, and Le, Quoc VV. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pp. 3104–3112, 2014.
    Google ScholarLocate open access versionFindings
  • Tang, Yichuan, Srivastava, Nitish, and Salakhutdinov, Ruslan. Learning generative models with visual attention. arXiv preprint arXiv:1312.6110, 2013.
    Findings
  • Tieleman, Tijmen. Optimizing Neural Networks that Generate Images. PhD thesis, University of Toronto, 2014.
    Google ScholarFindings
  • Uria, Benigno, Murray, Iain, and Larochelle, Hugo. A deep and tractable density estimator. In Proceedings of the 31st International Conference on Machine Learning, pp. 467–475, 2014.
    Google ScholarLocate open access versionFindings
  • Zheng, Yin, Zemel, Richard S, Zhang, Yu-Jin, and Larochelle, Hugo. A neural autoregressive approach to attention-based recognition. International Journal of Computer Vision, pp. 1–13, 2014.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科