AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We show that energy based models support composition on both the factor and object level, unifying different perspectives of compositionality and can recursively combine with each other

Compositional Visual Generation with Energy Based Models

NIPS 2020, (2020): 6637-6647

Cited by: 0|Views39
EI
Full Text
Bibtex
Weibo

Abstract

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge. In this paper we show that energy-based models can exhibit this ability by directly combining probability distributions. Samples from the combined distribution corresp...More

Code:

Data:

0
Introduction
  • Humans are able to rapidly learn new concepts and continuously integrate them among prior knowledge.
  • This makes it difficult to introduce new factors of variation, which may be necessary to explain new data, or to taxonomize past data in new ways
  • Another approach to incorporate the compositionality is to spatially decompose an image into a collection of objects, each object slot occupying some pixels of the image defined by a segmentation mask [28, 6].
  • These two incorporations of compositionality are considered distinct, with very different underlying implementations
Highlights
  • Humans are able to rapidly learn new concepts and continuously integrate them among prior knowledge
  • In this work∗, we propose to implement the compositionality via energy based models (EBMs)
  • We further investigate implication using a composition of conjunctions and negations in EBMs
  • We evaluate to what extent compositionality in EBMs enables continual learning of new concepts and their combination with previously learned concepts
  • We evaluate inference on an EBM trained on object position, which takes an image and an object position (x,y in 2D) as input and outputs an energy
  • We show that EBMs support composition on both the factor and object level, unifying different perspectives of compositionality and can recursively combine with each other
Methods
  • The authors first give an overview of the Energy-Based Model formulation the authors use and introduce three logical operators over these models.
  • EBMs represent data by learning an unnormalized probability distribution across the data.
  • For each data point x, an energy function Eθ(x), parameterized by a neural network, outputs a scalar real energy such that the model distribution pθ(x) ∝ e−Eθ(x).
  • To train an EBM on a data distribution pD, the authors use contrastive divergence [10].
  • To sample x− from pθ for both training and generation, the authors use MCMC based off Langevin dynamics [30].
Results
  • The authors' classifier obtains 99.3% accuracy for position and 99.9% for color on the test set.
Conclusion
  • The authors demonstrate the potential of EBMs for both compositional generation and inference.
  • The authors show that EBMs support composition on both the factor and object level, unifying different perspectives of compositionality and can recursively combine with each other.
  • The authors further showcase how this composition can be applied to both continually learn and compositionally infer underlying concepts.
  • The authors hope the results inspire future work in this direction
Summary
  • Introduction:

    Humans are able to rapidly learn new concepts and continuously integrate them among prior knowledge.
  • This makes it difficult to introduce new factors of variation, which may be necessary to explain new data, or to taxonomize past data in new ways
  • Another approach to incorporate the compositionality is to spatially decompose an image into a collection of objects, each object slot occupying some pixels of the image defined by a segmentation mask [28, 6].
  • These two incorporations of compositionality are considered distinct, with very different underlying implementations
  • Methods:

    The authors first give an overview of the Energy-Based Model formulation the authors use and introduce three logical operators over these models.
  • EBMs represent data by learning an unnormalized probability distribution across the data.
  • For each data point x, an energy function Eθ(x), parameterized by a neural network, outputs a scalar real energy such that the model distribution pθ(x) ∝ e−Eθ(x).
  • To train an EBM on a data distribution pD, the authors use contrastive divergence [10].
  • To sample x− from pθ for both training and generation, the authors use MCMC based off Langevin dynamics [30].
  • Results:

    The authors' classifier obtains 99.3% accuracy for position and 99.9% for color on the test set.
  • Conclusion:

    The authors demonstrate the potential of EBMs for both compositional generation and inference.
  • The authors show that EBMs support composition on both the factor and object level, unifying different perspectives of compositionality and can recursively combine with each other.
  • The authors further showcase how this composition can be applied to both continually learn and compositionally infer underlying concepts.
  • The authors hope the results inspire future work in this direction
Tables
  • Table1: Quantitative evaluation of conjunction (&), disjunction (|) and negation (¬) generations on the Mujoco Scenes dataset using an EBM or the approach in [<a class="ref-link" id="c29" href="#r29">29</a>]. Position = Pos. Each individual attribute (Color or Position ) generation is a individual EBM. (Acc: accuracy) Standard error is close to 0.01 for all models
  • Table2: Quantitative evaluation of continual learning. A position EBM is first trained on “purple” “cubes” at different positions. A shape EBM is then trained on different “purple” shapes. Finally, a color EBM is trained on shapes of many colors with Earlier EBMs are fixed and combined with new EBMs. We compare with a GAN model [<a class="ref-link" id="c21" href="#r21">21</a>] which is also trained on the same position, shape and color dataset. EBMs is better at continually learning new concepts and remember the old concepts. (Acc: accuracy)
Download tables as Excel
Related work
  • Our work draws on results in energy based models - see [17] for a comprehensive review. A number of methods have been used for inference and sampling in EBMs, from Gibbs Sampling [12], Langevin Dynamics [31, 3], Path Integral methods [2] and learned samplers [13, 26]. In this work, we apply EBMs to the task of compositional generation.

    Compositionality has been incorporated in representation learning (see [1] for a summary) and generative modeling. One approach to compositionality has focused on learning disentangled factors of variation [8, 15, 29]. Such an approach allows for the combination of existing factors, but does not allow the addition of new factors. A different approach to compositionality includes learning various different pixel/segmentation masks for each concept [6, 7]. However such a factorization may have difficulty capturing the global structure of an image, and in many cases different concepts cannot be explicitly factored using attention masks.
Funding
  • Our classifier obtains 99.3% accuracy for position and 99.9% for color on the test set
Reference
  • J. Andreas. Measuring compositionality in representation learning. arXiv preprint arXiv:1902.07181, 2019.
    Findings
  • Y. Du, T. Lin, and I. Mordatch. Model based planning with energy based models. CoRL, 2019.
    Google ScholarFindings
  • Y. Du and I. Mordatch. Implicit generation and generalization in energy-based models. arXiv preprint arXiv:1903.08689, 2019.
    Findings
  • S. A. Eslami, D. J. Rezende, F. Besse, F. Viola, A. S. Morcos, M. Garnelo, A. Ruderman, A. A. Rusu, I. Danihelka, K. Gregor, et al. Neural scene representation and rendering. Science, 360(6394):1204–1210, 2018.
    Google ScholarLocate open access versionFindings
  • J. A. Fodor and E. Lepore. The compositionality papers. Oxford University Press, 2002.
    Google ScholarFindings
  • K. Greff, R. L. Kaufmann, R. Kabra, N. Watters, C. Burgess, D. Zoran, L. Matthey, M. Botvinick, and A. Lerchner. Multi-object representation learning with iterative variational inference. arXiv preprint arXiv:1903.00450, 2019.
    Findings
  • K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623, 2015.
    Findings
  • I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner. Beta-vae: Learning basic visual concepts with a constrained variational framework. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • I. Higgins, N. Sonnerat, L. Matthey, A. Pal, C. P. Burgess, M. Bosnjak, M. Shanahan, M. Botvinick, D. Hassabis, and A. Lerchner. Scan: Learning hierarchical compositional visual concepts. ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • G. E. Hinton. Products of experts. International Conference on Artificial Neural Networks, 1999.
    Google ScholarLocate open access versionFindings
  • G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
    Google ScholarLocate open access versionFindings
  • G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527–1554, 2006.
    Google ScholarLocate open access versionFindings
  • T. Kim and Y. Bengio. Deep directed generative models with energy-based probability estimation. arXiv preprint arXiv:1606.03439, 2016.
    Findings
  • J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
    Google ScholarLocate open access versionFindings
  • T. D. Kulkarni, W. F. Whitney, P. Kohli, and J. Tenenbaum. Deep convolutional inverse graphics network. In NIPS, 2015.
    Google ScholarLocate open access versionFindings
  • B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman. Building machines that learn and think like people. Behavioral and brain sciences, 40, 2017.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, S. Chopra, and R. Hadsell. A tutorial on energy-based learning. 2006.
    Google ScholarFindings
  • Z. Li and D. Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
    Google ScholarLocate open access versionFindings
  • A. Mnih and G. Hinton. Learning nonlinear constraints with contrastive backpropagation. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, pages 1302–1307. IEEE, 2005.
    Google ScholarLocate open access versionFindings
  • G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter. Continual lifelong learning with neural networks: A review. CoRR, abs/1802.07569, 2018.
    Findings
  • A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
    Findings
  • P. Ramachandran, B. Zoph, and Q. V. Le. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
    Findings
  • S. Reed, Y. Chen, T. Paine, A. v. d. Oord, S. Eslami, D. Rezende, O. Vinyals, and N. de Freitas. Few-shot autoregressive density estimation: Towards learning to learn distributions. arXiv preprint arXiv:1710.10304, 2017.
    Findings
  • A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
    Findings
  • N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
    Findings
  • Y. Song and Z. Ou. Learning neural random fields with inclusive auxiliary generators. arXiv preprint arXiv:1806.00271, 2018.
    Findings
  • E. Todorov, T. Erez, and Y. Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • S. van Steenkiste, K. Kurach, and S. Gelly. A case for object compositionality in deep generative models of images. arXiv preprint arXiv:1810.10340, 2018.
    Findings
  • R. Vedantam, I. Fischer, J. Huang, and K. Murphy. Generative models of visually grounded imagination. In ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 681–688, 2011.
    Google ScholarLocate open access versionFindings
  • J. Xie, Y. Lu, S.-C. Zhu, and Y. Wu. A theory of generative convnet. In International Conference on Machine Learning, pages 2635–2644, 2016.
    Google ScholarLocate open access versionFindings
Author
Shuang Li
Shuang Li
Your rating :
0

 

Tags
Comments
小科