# Adversarially Regularized Autoencoders

ICML, pp. 5897-5906, 2018.

EI

Weibo:

Abstract:

Deep latent variable models, trained using variational autoencoders or generative adversarial networks, are now a key technique for representation learning of continuous structures. However, applying similar methods to discrete structures, such as text sequences or discretized images, has proven to be more challenging. In this work, we ...More

Code:

Data:

Introduction

- Recent work on deep latent variable models, such as variational autoencoders (Kingma & Welling, 2014) and generative adversarial networks (Goodfellow et al, 2014), has shown significant progress in learning smooth representations of complex, high-dimensional continuous data such as images.
- Recent work on Wasserstein GAN (WGAN) (Arjovsky et al, 2017), replaces this with the Earth-Mover (Wasserstein-1) distance

Highlights

- Recent work on deep latent variable models, such as variational autoencoders (Kingma & Welling, 2014) and generative adversarial networks (Goodfellow et al, 2014), has shown significant progress in learning smooth representations of complex, high-dimensional continuous data such as images
- This adversarially regularized autoencoder (ARAE) can further be formalized under the recently-introduced Wasserstein autoencoder (WAE) framework (Tolstikhin et al, 2018), which generalizes the adversarial autoencoder. This framework connects regularized autoencoders to an optimal transport objective for an implicit generative model. We extend this class of latent variable models to the case of discrete output, showing that the autoencoder cross-entropy loss upper-bounds the total variational distance between the model/data distributions
- We experiment with adversarially regularized autoencoder on three setups: (1) a small model using discretized images trained on the binarized version of MNIST, (2) a model for text sequences trained on the Stanford Natural Language Inference (SNLI) corpus (Bowman et al, 2015), and (3) a model trained for text transfer trained on the Yelp/Yahoo datasets for unaligned sentiment/topic transfer
- We present adversarially regularized autoencoders (ARAE) as a simple approach for training a discrete structure autoencoder jointly with a code-space generative adversarial network
- Utilizing the Wasserstein autoencoder framework (Tolstikhin et al, 2018), we interpret adversarially regularized autoencoder as learning a latent variable model that minimizes an upper bound on the total variation distance between the data/model distributions
- We find that the model learns an improved autoencoder and exhibits a smooth latent space, as demonstrated by semisupervised experiments, improvements on text style transfer, and manipulations in the latent space

Methods

- The authors experiment with ARAE on three setups: (1) a small model using discretized images trained on the binarized version of MNIST, (2) a model for text sequences trained on the Stanford Natural Language Inference (SNLI) corpus (Bowman et al, 2015), and (3) a model trained for text transfer trained on the Yelp/Yahoo datasets for unaligned sentiment/topic transfer.
- The image model encodes/decodes binarized images.
- The encoder used is an MLP mapping from {0, 1}n → Rm, encφ(x) =

Conclusion

- Impact of Regularization on Discrete Encoding The authors further examine the impact of adversarial regularization on the encoded representation produced by the model as it is trained.
- The graph in Figure 3 shows that the cosine similarity of nearby sentences is quite high for ARAE compared to a standard AE and increases in early rounds of training.
- To further test this property, the authors feed noised discrete input to the encoder and (i) calculate the score given to the original input, and compare the resulting reconstructions.
- Training deep latent variable models that can robustly model complex discrete structures remains an important open issue in the field

Summary

## Introduction:

Recent work on deep latent variable models, such as variational autoencoders (Kingma & Welling, 2014) and generative adversarial networks (Goodfellow et al, 2014), has shown significant progress in learning smooth representations of complex, high-dimensional continuous data such as images.- Recent work on Wasserstein GAN (WGAN) (Arjovsky et al, 2017), replaces this with the Earth-Mover (Wasserstein-1) distance
## Methods:

The authors experiment with ARAE on three setups: (1) a small model using discretized images trained on the binarized version of MNIST, (2) a model for text sequences trained on the Stanford Natural Language Inference (SNLI) corpus (Bowman et al, 2015), and (3) a model trained for text transfer trained on the Yelp/Yahoo datasets for unaligned sentiment/topic transfer.- The image model encodes/decodes binarized images.
- The encoder used is an MLP mapping from {0, 1}n → Rm, encφ(x) =
## Conclusion:

Impact of Regularization on Discrete Encoding The authors further examine the impact of adversarial regularization on the encoded representation produced by the model as it is trained.- The graph in Figure 3 shows that the cosine similarity of nearby sentences is quite high for ARAE compared to a standard AE and increases in early rounds of training.
- To further test this property, the authors feed noised discrete input to the encoder and (i) calculate the score given to the original input, and compare the resulting reconstructions.
- Training deep latent variable models that can robustly model complex discrete structures remains an important open issue in the field

- Table1: Reverse PPL: Perplexity of language models trained on the synthetic samples from a ARAE/AE/LM, and evaluated on real data. Forward PPL: Perplexity of a language model trained on real data and evaluated on synthetic samples
- Table2: Sentiment transfer results, where we transfer from positive to negative sentiment (Top) and negative to positive sentiment (Bottom). Original sentence and transferred output (from ARAE and the Cross-Aligned AE (from <a class="ref-link" id="cShen_et+al_2017_a" href="#rShen_et+al_2017_a">Shen et al (2017</a>)) of 6 randomlydrawn examples
- Table3: Sentiment transfer. (Top) Automatic metrics (Transfer/BLEU/Forward PPL/Reverse PPL), (Bottom) Human evaluation metrics (Transfer/Similarity/Naturalness). Cross-Aligned AE is from <a class="ref-link" id="cShen_et+al_2017_a" href="#rShen_et+al_2017_a">Shen et al (2017</a>). top) shows quantitative evaluation. We use four automatic metrics: (i) Transfer: how successful the model is at altering sentiment based on an automatic classifier (we use the fastText library (<a class="ref-link" id="cJoulin_et+al_2017_a" href="#rJoulin_et+al_2017_a">Joulin et al, 2017</a>)); (ii) BLEU: the consistency between the transferred text and the original; (iii) Forward PPL: the fluency of the generated text; (iv) Reverse PPL: measuring the extent to which the generations are representative of the underlying data distribution. Both perplexity numbers are obtained by training an RNN language model. bottom) shows human evaluations on the cross-aligned AE and our best ARAE model. We randomly select 1000 sentences (500/500 positive/negative), obtain the corresponding transfers from both models, and ask crowdworkers to evaluate the sentiment (Positive/Neutral/Negative) and naturalness (1-5, 5 being most natural) of the transferred sentences. We create a separate task in which we show the original and the transferred sentences, and ask them to evaluate the similarity based on sentence structure (1-5, 5 being most similar). We explicitly requested that the reader disregard sentiment in similarity
- Table4: Topic Transfer. Random samples from the Yahoo dataset. Note the first row is from ARAE trained on titles while the following ones are from replies
- Table5: Semi-Supervised accuracy on the natural language inference (SNLI) test set, respectively using 22.2% (medium), 10.8% (small), 5.25% (tiny) of the supervised labels of the full SNLI training set (rest used for unlabeled AE training)

Related work

- While ideally autoencoders would learn latent spaces which compactly capture useful features that explain the observed data, in practice they often learn a degenerate identity mapping where the latent code space is free of any structure, necessitating the need for some regularization on the latent space. A popular approach is to regularize through an explicit prior on the code space and use a variational approximation to the posterior, leading to a family of models called variational autoencoders (VAE) (Kingma & Welling, 2014; Rezende et al, 2014). Unfortunately VAEs for discrete text sequences can be challenging to train—for example, if the training procedure is not carefully tuned with techniques like word dropout and KL annealing (Bowman et al, 2016), the decoder simply becomes a language model and ignores the latent code. However there have been some recent successes through employing convolutional decoders (Yang et al, 2017; Semeniuta et al, 2017), training the latent representation as a topic model (Dieng et al, 2017; Wang et al, 2018), using the von Mises–Fisher distribution (Guu et al, 2017), and combining VAE with iterative inference (Kim et al, 2018). There has also been some work on making the prior more flexible through explicit parameterization (Chen et al, 2017; Tomczak & Welling, 2018). A notable technique is adversarial autoencoders (AAE) (Makhzani et al, 2015) which attempt to imbue the model with a more flexible prior implicitly through adversarial training. Recent work on Wasserstein autoencoders (Tolstikhin et al, 2018) provides a theoretical foundation for the AAE and shows that AAE minimizes the Wasserstein distance between the data/model distributions.

Funding

- Yoon Kim was supported by a gift from Amazon AWS Machine Learning Research

Reference

- Arjovsky, M. and Bottou, L. Towards Principled Methods for Training Generative Adversarial Networks. In Proceedings of ICML, 2017.
- Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein GAN. In Proceedings of ICML, 2017.
- Bowman, S. R., Angeli, G., Potts, C., and Manning., C. D. A large annotated corpus for learning natural language inference. In Proceedings of EMNLP, 2015.
- Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. Generating Sentences from a Continuous Space. 2016.
- Che, T., Li, Y., Zhang, R., Hjelm, R. D., Li, W., Song, Y., and Bengio, Y. Maximum-Likelihood Augment Discrete Generative Adversarial Networks. arXiv:1702.07983, 2017.
- Chen, X., Kingma, D. P., Salimans, T., Duan, Y., Dhariwal, P., Schulman, J., Sutskever, I., and Abbeel, P. Variational Lossy Autoencoder. In Proceedings of ICLR, 2017.
- Cífka, O., Severyn, A., Alfonseca, E., and Filippova, K. Eval all, trust a few, do wrong to none: Comparing sentence generation models. arXiv:1804.07972, 2018.
- Dai, A. M. and Le, Q. V. Semi-supervised sequence learning. In Proceedings of NIPS, 2015.
- Denton, E. and Birodkar, V. Unsupervised learning of disentangled representations from video. In Proceedings of NIPS, 2017.
- Dieng, A. B., Wang, C., Gao, J.,, and Paisley, J. TopicRNN: A Recurrent Neural Network With Long-Range Semantic Dependency. In Proceedings of ICLR, 2017.
- Donahue, J., Krahenbühl, P., and Darrell, T. Adversarial Feature Learning. In Proceedings of ICLR, 2017.
- Glynn, P. Likelihood Ratio Gradient Estimation: An Overview. In Proceedings of Winter Simulation Conference, 1987.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Proceedings of NIPS, 2014.
- Gozlan, N. and Léonard, C. Transport Inequalities. A Survey. arXiv:1003.3852, 2010.
- Gulrajani, I., Ahmed, F., Arjovsky, M., and Vincent Dumoulin, A. C. Improved Training of Wasserstein GANs. In Proceedings of NIPS, 2017.
- Guu, K., Hashimoto, T. B., Oren, Y., and Liang, P. Generating Sentences by Editing Prototypes. arXiv:1709.08878, 2017.
- Hill, F., Cho, K., and Korhonen, A. Learning distributed representations of sentences from unlabelled data. In Proceedings of NAACL, 2016.
- Hjelm, R. D., Jacob, A. P., Che, T., Cho, K., and Bengio, Y. Boundary-Seeking Generative Adversarial Networks. In Proceedings of ICLR, 2018.
- Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., and Xing, E. P. Controllable Text Generation. In Proceedings of ICML, 2017.
- Hu, Z., Yang, Z., Salakhutdinov, R., and Xing, E. P. On Unifying Deep Generative Models. In Proceedings of ICLR, 2018.
- Jang, E., Gu, S., and Poole, B. Categorical Reparameterization with Gumbel-Softmax. In Proceedings of ICLR, 2017.
- Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. Bag of Tricks for Efficient Text Classification. In Proceedings of ACL, 2017.
- Kim, Y., Wiseman, S., Miller, A. C., Sontag, D., and Rush, A. M. Semi-Amortized Variational Autoencoders. In Proceedings of ICML, 2018.
- Kingma, D. P. and Welling, M. Auto-Encoding Variational Bayes. In Proceedings of ICLR, 2014.
- Kusner, M. and Hernandez-Lobato, J. M. GANs for Sequences of Discrete Elements with the Gumbel-Softmax Distribution. arXiv:1611.04051, 2016.
- Lample, G., Zeghidour, N., Usuniera, N., Bordes, A., Denoyer, L., and Ranzato, M. Fader networks: Manipulating images by sliding attributes. In Proceedings of NIPS, 2017.
- Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., and Jurafsky, D. Adversarial Learning for Neural Dialogue Generation. In Proceedings of EMNLP, 2017.
- Li, J., Jia, R., He, H., and Liang, P. Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer. In Proceedings of NAACL, 2018.
- Maddison, C. J., Mnih, A., and Teh, Y. W. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. In Proceedings of ICLR, 2017.
- Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. Adversarial Autoencoders. arXiv:1511.05644, 2015.
- Mikolov, T., tau Yih, S. W., and Zweig, G. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL, 2013.
- Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral Normalization For Generative Adversarial Networks. In Proceedings of ICLR, 2018.
- Mueller, J., Gifford, D., and Jaakkola, T. Sequence to Better Sequence: Continuous Revision of Combinatorial Structures. In Proceedings of ICML, 2017.
- Prabhumoye, S., Tsvetkov, Y., Salakhutdinov, R., and Black, A. W. Style Transfer Through Back-Translation. In Proceedings of ACL, 2018.
- Press, O., Bar, A., Bogin, B., Berant, J., and Wolf, L. Language Generation with Recurrent Generative Adversarial Networks without Pre-training. arXiv:1706.01399, 2017.
- Radford, A., Metz, L., and Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In Proceedings of ICLR, 2016.
- Rajeswar, S., Subramanian, S., Dutil, F., Pal, C., and Courville, A. Adversarial Generation of Natural Language. arXiv:1705.10929, 2017.
- Rezende, D. J., Mohamed, S., and Wierstra, D. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In Proceedings of ICML, 2014.
- Rifai, S., Vincent, P., Muller, X., Glorot, X., and Bengio, Y. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction. In Proceedings of ICML, 2011.
- Semeniuta, S., Severyn, A., and Barth, E. A Hybrid Convolutional Variational Autoencoder for Text Generation. In Proceedings of EMNLP, 2017.
- Shen, T., Lei, T., Barzilay, R., and Jaakkola, T. Style Transfer from Non-Parallel Text by Cross-Alignment. In Proceedings of NIPS, 2017.
- Theis, L., van den Oord, A., and Bethge, M. A note on the evaluation of generative models. In Proceedings of ICLR, 2016.
- Tolstikhin, I., Bousquet, O., Gelly, S., and Schoelkopf, B. Wasserstein Auto-Encoders. In Proceedings of ICLR, 2018.
- Tomczak, J. M. and Welling, M. VAE with a VampPrior. In Proceedings of AISTATS, 2018.
- Villani, C. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
- Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A. Extracting and Composing Robust Features with Denoising Autoencoders. In Proceedings of ICML, 2008.
- Vincent Dumoulin, Ishmael Belghazi, B. P. O. M. A. L. M. A. A. C. Adversarially Learned Inference. In Proceedings of ICLR, 2017.
- Wang, W., Gan, Z., Wang, W., Shen, D., Huang, J., Ping, W., Satheesh, S., and Carin, L. Topic Compositional Neural Language Model. In Proceedings of AISTATS, 2018.
- Williams, R. J. Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8, 1992.
- Yang, Z., Hu, Z., Salakhutdinov, R., and Berg-Kirkpatrick, T. Improved Variational Autoencoders for Text Modeling using Dilated Convolutions. In Proceedings of ICML, 2017.
- Yang, Z., Hu, Z., Dyer, C., Xing, E. P., and Berg-Kirkpatrick, T. Unsupervised Text Style Transfer using Language Models as Discriminators. arXiv:1805.11749, 2018.
- Yu, L., Zhang, W., Wang, J., and Yu, Y. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Proceedings of AAAI, 2017.
- Zhang, X., Zhao, J., and LeCun, Y. Character-level Convolutional Networks for Text Classification. In Proceedings of NIPS, 2015.

Full Text

Tags

Comments