## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# On the "steerability" of generative adversarial networks

ICLR, (2020)

EI

Abstract

An open secret in contemporary machine learning is that many models work beautifully on standard benchmarks but fail to generalize outside the lab. This has been attributed to biased training data, which provide poor coverage over real world events. Generative models are no exception, but recent advances in generative adversarial networks...More

Introduction

- The quality of deep generative models has increased dramatically over the past few years.
- When introduced in 2014, Generative Adversarial Networks (GANs) could only synthesize MNIST digits and low-resolution grayscale faces (Goodfellow et al, 2014).
- Traditional computer graphics can render photorealistic 3D scenes, but cannot automatically generate detailed content.
- Generative models like GANs, in contrast, can create content from scratch, but we do not currently have tools for navigating the generated scenes in the same kind of way as you can walk through and interact with a 3D game engine

Highlights

- The quality of deep generative models has increased dramatically over the past few years
- Science fiction has long dreamed of virtual realities filled of synthetic content as rich as, or richer, than the real world (e.g., The Matrix, Ready Player One)
- Denton et al (2019) learns linear walks corresponding to various facial characteristics – they use these to measure biases in facial attribute detectors, whereas we study biases in the generative model that originate from training data
- Where we draw images x from the training dataset and perform data augmentation by applying the edit operation on them. This optimization approach encourages the generator to organize its latent space so that the transformations lie along linear paths, and when combined with data augmentation, results in larger transformation ranges which we demonstrate in Sec. 4.4
- We demonstrate our approach using BigGAN (Brock et al, 2018), a class-conditional Generative Adversarial Networks trained on 1000 ImageNet categories
- Generative Adversarial Networks are powerful generative models, but are they replicating the existing training datapoints, or can they to generalize beyond the training distribution? We investigate this question by exploring walks in the latent space of Generative Adversarial Networks

Methods

- Generative models such as GANs (Goodfellow et al, 2014) learn a mapping function G such that G : z → x.
- We experiment with a number of different transformations learned in the latent space, each corresponding to a different walk vector.
- Each of these transformations can be learned without any direct supervision, by applying our desired edit to the source image.
- A mask applied during training allows the generator to inpaint the background scene

Results

- There is a positive correlation between the spread of the dataset and the magnitude of ∆μ observed in the transformed model distributions, and the slope of all observed trends differs significantly from zero (p < 0.001 for all transformations)

Conclusion

- We investigate this question by exploring walks in the latent space of GANs. We optimize trajectories in latent space to reflect simple image transformations in the generated output, learned in a self-supervised manner.
- Our ability to naively move the distribution is finite: we can transform images to some degree but cannot extrapolate entirely outside the support of the training data.
- We add data augmentation during training and jointly optimize the model and walk trajectory.
- Our experiments illustrate the connection between training data bias and the resulting distribution of generated images, and suggest methods for extending the range of images that the models are able to create

Summary

## Introduction:

The quality of deep generative models has increased dramatically over the past few years.- When introduced in 2014, Generative Adversarial Networks (GANs) could only synthesize MNIST digits and low-resolution grayscale faces (Goodfellow et al, 2014).
- Traditional computer graphics can render photorealistic 3D scenes, but cannot automatically generate detailed content.
- Generative models like GANs, in contrast, can create content from scratch, but we do not currently have tools for navigating the generated scenes in the same kind of way as you can walk through and interact with a 3D game engine
## Methods:

Generative models such as GANs (Goodfellow et al, 2014) learn a mapping function G such that G : z → x.- We experiment with a number of different transformations learned in the latent space, each corresponding to a different walk vector.
- Each of these transformations can be learned without any direct supervision, by applying our desired edit to the source image.
- A mask applied during training allows the generator to inpaint the background scene
## Results:

There is a positive correlation between the spread of the dataset and the magnitude of ∆μ observed in the transformed model distributions, and the slope of all observed trends differs significantly from zero (p < 0.001 for all transformations)## Conclusion:

We investigate this question by exploring walks in the latent space of GANs. We optimize trajectories in latent space to reflect simple image transformations in the generated output, learned in a self-supervised manner.- Our ability to naively move the distribution is finite: we can transform images to some degree but cannot extrapolate entirely outside the support of the training data.
- We add data augmentation during training and jointly optimize the model and walk trajectory.
- Our experiments illustrate the connection between training data bias and the resulting distribution of generated images, and suggest methods for extending the range of images that the models are able to create

- Table1: Pearson’s correlation coefficient between dataset σ and model ∆μ for measured attributes. p-value for slope < 0.001 for all transformations

Related work

- Latent space manipulations can be seen from several perspectives – how we achieve it, what limits it, and what it enables us to do. Our work addresses these three aspects together, and we briefly refer to each one in related work.

Interpolations in latent space Traditional approaches to image editing with GAN latent spaces find linear directions that correspond to changes in labeled attributes, such as smile-vectors and gender-vectors for faces (Radford et al, 2015; Karras et al, 2018). However these manipulations are not exclusive to GANs; in flow-based generative models, linearly interpolating between two encoded images allow one to edit a source image toward attributes of the target (Kingma & Dhariwal, 2018). Mollenhoff & Cremers (2019) proposes a modified GAN formulation by treating data as directional k-currents, where moving along tangent planes naturally corresponds to interpretable manipulations. Upchurch et al (2017) removes the generative model entirely and instead interpolates in the intermediate feature space of a pretrained classifier, again using feature mappings of source and target sets to determine an edit direction. Unlike these approaches, we learn our latentspace trajectories in a self-supervised manner without labeled attributes or distinct source and target images. Instead, we learn to approximate editing operations on individual source images. We find that linear trajectories in latent space can capture simple image manipulations, e.g., zoom-vectors and shift-vectors, although we also obtain similar results using nonlinear trajectories.

Funding

- This work was supported by a Google Faculty Research Award to P.I., and a U.S National Science Foundation Graduate Research Fellowship to L.C

Study subjects and analysis

unique samples: 20000

**A METHOD DETAILS**

A.1. OPTIMIZATION FOR THE LINEAR WALK

We learn the walk vector using mini-batch stochastic gradient descent with the Adam optimizer (Kingma & Ba, 2014) in tensorflow, trained on 20000 unique samples from the latent space z. We share the vector w across all ImageNet categories for the BigGAN model.

A.2

potential failure cases: 2

However, when we increase the step size of α, we observe that the degree to which we can achieve each transformation is limited. In Fig. 3 we observe two potential failure cases: one in which the the image becomes unrealistic, and the other in which the image fails to transform any further. When we try to zoom in on a Persian cat, we observe that the cat no longer increases in size beyond some point, and in fact consistently undershoots the target zoom

samples: 200

In Fig. 6 we plot the standard deviation σ of the dataset on the x-axis, and the model ∆μ under a +α∗ and −α∗ transformation on the y-axis, as defined in Eq 6. We sample randomly from 100 classes for the color, zoom and shift transformations, and generate 200 samples of each class under the positive and negative transformations. We use the same setup of drawing samples from the model and dataset and computing the statistics for each transformation as described in Sec. 4.1

Reference

- Aharon Azulay and Yair Weiss. Why do deep convolutional networks generalize so poorly to small image transformations? arXiv preprint arXiv:1805.12177, 2018.
- David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B Tenenbaum, William T Freeman, and Antonio Torralba. Gan dissection: Visualizing and understanding generative adversarial networks. arXiv preprint arXiv:1811.10597, 2018.
- Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
- Taco S Cohen, Maurice Weiler, Berkay Kicanaoglu, and Max Welling. Gauge equivariant convolutional networks and the icosahedral cnn. arXiv preprint arXiv:1902.04615, 2019.
- Emily Denton, Ben Hutchinson, Margaret Mitchell, and Timnit Gebru. Detecting bias with generative counterfactual face attribute augmentation. arXiv preprint arXiv:1906.06439, 2019.
- Bella DiGrazia. Swampscott fd debuts new blue fire truck, 2019. https://www.itemlive.com/2019/05/29/swampscott-fd-debuts-new-blue-fire-truck/, accessed 2019-09-18.
- William T. Freeman and Edward H Adelson. The design and use of steerable filters. IEEE Transactions on Pattern Analysis & Machine Intelligence, (9):891–906, 1991.
- Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
- Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola. Ganalyze: Toward visual definitions of cognitive image properties. arXiv preprint arXiv:1906.10112, 2019.
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
- Ali Jahanian, SVN Vishwanathan, and Jan P Allebach. Learning visual balance from large-scale datasets of aesthetically highly rated images. In Human Vision and Electronic Imaging XX, volume 9394, pp. 93940Y. International Society for Optics and Photonics, 2015.
- Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
- Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948, 2018.
- Davis E. King. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10: 1755–1758, 2009.
- Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pp. 10236–10245, 2018.
- Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
- Karel Lenc and Andrea Vedaldi. Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 991–999, 2015.
- Elad Mezuman and Yair Weiss. Learning about canonical views from internet image collections. In Advances in neural information processing systems, pp. 719–727, 2012.
- Thomas Mollenhoff and Daniel Cremers. Flat metric minimization with applications in generative modeling. arXiv preprint arXiv:1905.04730, 2019.
- Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. Interpreting the latent space of gans for semantic face editing. arXiv preprint arXiv:1907.10786, 2019.
- Joel Simon. Ganbreeder. http:/https://ganbreeder.app/, accessed 2019-03-22.
- Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. 2011.
- Paul Upchurch, Jacob Gardner, Geoff Pleiss, Robert Pless, Noah Snavely, Kavita Bala, and Kilian Weinberger. Deep feature interpolation for image content changes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7064–7073, 2017.
- Tom White. Sampling generative networks. arXiv preprint arXiv:1609.04468, 2016.
- Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- We learn the walk vector using mini-batch stochastic gradient descent with the Adam optimizer (Kingma & Ba, 2014) in tensorflow, trained on 20000 unique samples from the latent space z. We share the vector w across all ImageNet categories for the BigGAN model.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn