## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Compositional Visual Generation with Energy Based Models

NIPS 2020, (2020): 6637-6647

EI

Abstract

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge. In this paper we show that energy-based models can exhibit this ability by directly combining probability distributions. Samples from the combined distribution corresp...More

Code:

Data:

Introduction

- Humans are able to rapidly learn new concepts and continuously integrate them among prior knowledge.
- This makes it difficult to introduce new factors of variation, which may be necessary to explain new data, or to taxonomize past data in new ways
- Another approach to incorporate the compositionality is to spatially decompose an image into a collection of objects, each object slot occupying some pixels of the image defined by a segmentation mask [28, 6].
- These two incorporations of compositionality are considered distinct, with very different underlying implementations

Highlights

- Humans are able to rapidly learn new concepts and continuously integrate them among prior knowledge
- In this work∗, we propose to implement the compositionality via energy based models (EBMs)
- We further investigate implication using a composition of conjunctions and negations in EBMs
- We evaluate to what extent compositionality in EBMs enables continual learning of new concepts and their combination with previously learned concepts
- We evaluate inference on an EBM trained on object position, which takes an image and an object position (x,y in 2D) as input and outputs an energy
- We show that EBMs support composition on both the factor and object level, unifying different perspectives of compositionality and can recursively combine with each other

Methods

- The authors first give an overview of the Energy-Based Model formulation the authors use and introduce three logical operators over these models.
- EBMs represent data by learning an unnormalized probability distribution across the data.
- For each data point x, an energy function Eθ(x), parameterized by a neural network, outputs a scalar real energy such that the model distribution pθ(x) ∝ e−Eθ(x).
- To train an EBM on a data distribution pD, the authors use contrastive divergence [10].
- To sample x− from pθ for both training and generation, the authors use MCMC based off Langevin dynamics [30].

Results

- The authors' classifier obtains 99.3% accuracy for position and 99.9% for color on the test set.

Conclusion

- The authors demonstrate the potential of EBMs for both compositional generation and inference.
- The authors show that EBMs support composition on both the factor and object level, unifying different perspectives of compositionality and can recursively combine with each other.
- The authors further showcase how this composition can be applied to both continually learn and compositionally infer underlying concepts.
- The authors hope the results inspire future work in this direction

Summary

## Introduction:

Humans are able to rapidly learn new concepts and continuously integrate them among prior knowledge.- This makes it difficult to introduce new factors of variation, which may be necessary to explain new data, or to taxonomize past data in new ways
- Another approach to incorporate the compositionality is to spatially decompose an image into a collection of objects, each object slot occupying some pixels of the image defined by a segmentation mask [28, 6].
- These two incorporations of compositionality are considered distinct, with very different underlying implementations
## Methods:

The authors first give an overview of the Energy-Based Model formulation the authors use and introduce three logical operators over these models.- EBMs represent data by learning an unnormalized probability distribution across the data.
- For each data point x, an energy function Eθ(x), parameterized by a neural network, outputs a scalar real energy such that the model distribution pθ(x) ∝ e−Eθ(x).
- To train an EBM on a data distribution pD, the authors use contrastive divergence [10].
- To sample x− from pθ for both training and generation, the authors use MCMC based off Langevin dynamics [30].
## Results:

The authors' classifier obtains 99.3% accuracy for position and 99.9% for color on the test set.## Conclusion:

The authors demonstrate the potential of EBMs for both compositional generation and inference.- The authors show that EBMs support composition on both the factor and object level, unifying different perspectives of compositionality and can recursively combine with each other.
- The authors further showcase how this composition can be applied to both continually learn and compositionally infer underlying concepts.
- The authors hope the results inspire future work in this direction

- Table1: Quantitative evaluation of conjunction (&), disjunction (|) and negation (¬) generations on the Mujoco Scenes dataset using an EBM or the approach in [<a class="ref-link" id="c29" href="#r29">29</a>]. Position = Pos. Each individual attribute (Color or Position ) generation is a individual EBM. (Acc: accuracy) Standard error is close to 0.01 for all models
- Table2: Quantitative evaluation of continual learning. A position EBM is first trained on “purple” “cubes” at different positions. A shape EBM is then trained on different “purple” shapes. Finally, a color EBM is trained on shapes of many colors with Earlier EBMs are fixed and combined with new EBMs. We compare with a GAN model [<a class="ref-link" id="c21" href="#r21">21</a>] which is also trained on the same position, shape and color dataset. EBMs is better at continually learning new concepts and remember the old concepts. (Acc: accuracy)

Related work

- Our work draws on results in energy based models - see [17] for a comprehensive review. A number of methods have been used for inference and sampling in EBMs, from Gibbs Sampling [12], Langevin Dynamics [31, 3], Path Integral methods [2] and learned samplers [13, 26]. In this work, we apply EBMs to the task of compositional generation.

Compositionality has been incorporated in representation learning (see [1] for a summary) and generative modeling. One approach to compositionality has focused on learning disentangled factors of variation [8, 15, 29]. Such an approach allows for the combination of existing factors, but does not allow the addition of new factors. A different approach to compositionality includes learning various different pixel/segmentation masks for each concept [6, 7]. However such a factorization may have difficulty capturing the global structure of an image, and in many cases different concepts cannot be explicitly factored using attention masks.

Funding

- Our classifier obtains 99.3% accuracy for position and 99.9% for color on the test set

Reference

- J. Andreas. Measuring compositionality in representation learning. arXiv preprint arXiv:1902.07181, 2019.
- Y. Du, T. Lin, and I. Mordatch. Model based planning with energy based models. CoRL, 2019.
- Y. Du and I. Mordatch. Implicit generation and generalization in energy-based models. arXiv preprint arXiv:1903.08689, 2019.
- S. A. Eslami, D. J. Rezende, F. Besse, F. Viola, A. S. Morcos, M. Garnelo, A. Ruderman, A. A. Rusu, I. Danihelka, K. Gregor, et al. Neural scene representation and rendering. Science, 360(6394):1204–1210, 2018.
- J. A. Fodor and E. Lepore. The compositionality papers. Oxford University Press, 2002.
- K. Greff, R. L. Kaufmann, R. Kabra, N. Watters, C. Burgess, D. Zoran, L. Matthey, M. Botvinick, and A. Lerchner. Multi-object representation learning with iterative variational inference. arXiv preprint arXiv:1903.00450, 2019.
- K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623, 2015.
- I. Higgins, L. Matthey, A. Pal, C. P. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner. Beta-vae: Learning basic visual concepts with a constrained variational framework. In ICLR, 2017.
- I. Higgins, N. Sonnerat, L. Matthey, A. Pal, C. P. Burgess, M. Bosnjak, M. Shanahan, M. Botvinick, D. Hassabis, and A. Lerchner. Scan: Learning hierarchical compositional visual concepts. ICLR, 2018.
- G. E. Hinton. Products of experts. International Conference on Artificial Neural Networks, 1999.
- G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
- G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527–1554, 2006.
- T. Kim and Y. Bengio. Deep directed generative models with energy-based probability estimation. arXiv preprint arXiv:1606.03439, 2016.
- J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- T. D. Kulkarni, W. F. Whitney, P. Kohli, and J. Tenenbaum. Deep convolutional inverse graphics network. In NIPS, 2015.
- B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman. Building machines that learn and think like people. Behavioral and brain sciences, 40, 2017.
- Y. LeCun, S. Chopra, and R. Hadsell. A tutorial on energy-based learning. 2006.
- Z. Li and D. Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
- A. Mnih and G. Hinton. Learning nonlinear constraints with contrastive backpropagation. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, pages 1302–1307. IEEE, 2005.
- G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter. Continual lifelong learning with neural networks: A review. CoRR, abs/1802.07569, 2018.
- A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- P. Ramachandran, B. Zoph, and Q. V. Le. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
- S. Reed, Y. Chen, T. Paine, A. v. d. Oord, S. Eslami, D. Rezende, O. Vinyals, and N. de Freitas. Few-shot autoregressive density estimation: Towards learning to learn distributions. arXiv preprint arXiv:1710.10304, 2017.
- A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
- N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
- Y. Song and Z. Ou. Learning neural random fields with inclusive auxiliary generators. arXiv preprint arXiv:1806.00271, 2018.
- E. Todorov, T. Erez, and Y. Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
- S. van Steenkiste, K. Kurach, and S. Gelly. A case for object compositionality in deep generative models of images. arXiv preprint arXiv:1810.10340, 2018.
- R. Vedantam, I. Fischer, J. Huang, and K. Murphy. Generative models of visually grounded imagination. In ICLR, 2018.
- M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 681–688, 2011.
- J. Xie, Y. Lu, S.-C. Zhu, and Y. Wu. A theory of generative convnet. In International Conference on Machine Learning, pages 2635–2644, 2016.

Tags

Comments