Adversarially Regularized Autoencoders

ICML, pp. 5897-5906, 2018.

Cited by: 83|Bibtex|Views241
EI
Other Links: dblp.uni-trier.de|arxiv.org
Weibo:
We present adversarially regularized autoencoders as a simple approach for training a discrete structure autoencoder jointly with a code-space generative adversarial network

Abstract:

Deep latent variable models, trained using variational autoencoders or generative adversarial networks, are now a key technique for representation learning of continuous structures. However, applying similar methods to discrete structures, such as text sequences or discretized images, has proven to be more challenging. In this work, we ...More

Code:

Data:

0
Introduction
  • Recent work on deep latent variable models, such as variational autoencoders (Kingma & Welling, 2014) and generative adversarial networks (Goodfellow et al, 2014), has shown significant progress in learning smooth representations of complex, high-dimensional continuous data such as images.
  • Recent work on Wasserstein GAN (WGAN) (Arjovsky et al, 2017), replaces this with the Earth-Mover (Wasserstein-1) distance
Highlights
  • Recent work on deep latent variable models, such as variational autoencoders (Kingma & Welling, 2014) and generative adversarial networks (Goodfellow et al, 2014), has shown significant progress in learning smooth representations of complex, high-dimensional continuous data such as images
  • This adversarially regularized autoencoder (ARAE) can further be formalized under the recently-introduced Wasserstein autoencoder (WAE) framework (Tolstikhin et al, 2018), which generalizes the adversarial autoencoder. This framework connects regularized autoencoders to an optimal transport objective for an implicit generative model. We extend this class of latent variable models to the case of discrete output, showing that the autoencoder cross-entropy loss upper-bounds the total variational distance between the model/data distributions
  • We experiment with adversarially regularized autoencoder on three setups: (1) a small model using discretized images trained on the binarized version of MNIST, (2) a model for text sequences trained on the Stanford Natural Language Inference (SNLI) corpus (Bowman et al, 2015), and (3) a model trained for text transfer trained on the Yelp/Yahoo datasets for unaligned sentiment/topic transfer
  • We present adversarially regularized autoencoders (ARAE) as a simple approach for training a discrete structure autoencoder jointly with a code-space generative adversarial network
  • Utilizing the Wasserstein autoencoder framework (Tolstikhin et al, 2018), we interpret adversarially regularized autoencoder as learning a latent variable model that minimizes an upper bound on the total variation distance between the data/model distributions
  • We find that the model learns an improved autoencoder and exhibits a smooth latent space, as demonstrated by semisupervised experiments, improvements on text style transfer, and manipulations in the latent space
Methods
  • The authors experiment with ARAE on three setups: (1) a small model using discretized images trained on the binarized version of MNIST, (2) a model for text sequences trained on the Stanford Natural Language Inference (SNLI) corpus (Bowman et al, 2015), and (3) a model trained for text transfer trained on the Yelp/Yahoo datasets for unaligned sentiment/topic transfer.
  • The image model encodes/decodes binarized images.
  • The encoder used is an MLP mapping from {0, 1}n → Rm, encφ(x) =
Conclusion
  • Impact of Regularization on Discrete Encoding The authors further examine the impact of adversarial regularization on the encoded representation produced by the model as it is trained.
  • The graph in Figure 3 shows that the cosine similarity of nearby sentences is quite high for ARAE compared to a standard AE and increases in early rounds of training.
  • To further test this property, the authors feed noised discrete input to the encoder and (i) calculate the score given to the original input, and compare the resulting reconstructions.
  • Training deep latent variable models that can robustly model complex discrete structures remains an important open issue in the field
Summary
  • Introduction:

    Recent work on deep latent variable models, such as variational autoencoders (Kingma & Welling, 2014) and generative adversarial networks (Goodfellow et al, 2014), has shown significant progress in learning smooth representations of complex, high-dimensional continuous data such as images.
  • Recent work on Wasserstein GAN (WGAN) (Arjovsky et al, 2017), replaces this with the Earth-Mover (Wasserstein-1) distance
  • Methods:

    The authors experiment with ARAE on three setups: (1) a small model using discretized images trained on the binarized version of MNIST, (2) a model for text sequences trained on the Stanford Natural Language Inference (SNLI) corpus (Bowman et al, 2015), and (3) a model trained for text transfer trained on the Yelp/Yahoo datasets for unaligned sentiment/topic transfer.
  • The image model encodes/decodes binarized images.
  • The encoder used is an MLP mapping from {0, 1}n → Rm, encφ(x) =
  • Conclusion:

    Impact of Regularization on Discrete Encoding The authors further examine the impact of adversarial regularization on the encoded representation produced by the model as it is trained.
  • The graph in Figure 3 shows that the cosine similarity of nearby sentences is quite high for ARAE compared to a standard AE and increases in early rounds of training.
  • To further test this property, the authors feed noised discrete input to the encoder and (i) calculate the score given to the original input, and compare the resulting reconstructions.
  • Training deep latent variable models that can robustly model complex discrete structures remains an important open issue in the field
Tables
  • Table1: Reverse PPL: Perplexity of language models trained on the synthetic samples from a ARAE/AE/LM, and evaluated on real data. Forward PPL: Perplexity of a language model trained on real data and evaluated on synthetic samples
  • Table2: Sentiment transfer results, where we transfer from positive to negative sentiment (Top) and negative to positive sentiment (Bottom). Original sentence and transferred output (from ARAE and the Cross-Aligned AE (from <a class="ref-link" id="cShen_et+al_2017_a" href="#rShen_et+al_2017_a">Shen et al (2017</a>)) of 6 randomlydrawn examples
  • Table3: Sentiment transfer. (Top) Automatic metrics (Transfer/BLEU/Forward PPL/Reverse PPL), (Bottom) Human evaluation metrics (Transfer/Similarity/Naturalness). Cross-Aligned AE is from <a class="ref-link" id="cShen_et+al_2017_a" href="#rShen_et+al_2017_a">Shen et al (2017</a>). top) shows quantitative evaluation. We use four automatic metrics: (i) Transfer: how successful the model is at altering sentiment based on an automatic classifier (we use the fastText library (<a class="ref-link" id="cJoulin_et+al_2017_a" href="#rJoulin_et+al_2017_a">Joulin et al, 2017</a>)); (ii) BLEU: the consistency between the transferred text and the original; (iii) Forward PPL: the fluency of the generated text; (iv) Reverse PPL: measuring the extent to which the generations are representative of the underlying data distribution. Both perplexity numbers are obtained by training an RNN language model. bottom) shows human evaluations on the cross-aligned AE and our best ARAE model. We randomly select 1000 sentences (500/500 positive/negative), obtain the corresponding transfers from both models, and ask crowdworkers to evaluate the sentiment (Positive/Neutral/Negative) and naturalness (1-5, 5 being most natural) of the transferred sentences. We create a separate task in which we show the original and the transferred sentences, and ask them to evaluate the similarity based on sentence structure (1-5, 5 being most similar). We explicitly requested that the reader disregard sentiment in similarity
  • Table4: Topic Transfer. Random samples from the Yahoo dataset. Note the first row is from ARAE trained on titles while the following ones are from replies
  • Table5: Semi-Supervised accuracy on the natural language inference (SNLI) test set, respectively using 22.2% (medium), 10.8% (small), 5.25% (tiny) of the supervised labels of the full SNLI training set (rest used for unlabeled AE training)
Download tables as Excel
Related work
  • While ideally autoencoders would learn latent spaces which compactly capture useful features that explain the observed data, in practice they often learn a degenerate identity mapping where the latent code space is free of any structure, necessitating the need for some regularization on the latent space. A popular approach is to regularize through an explicit prior on the code space and use a variational approximation to the posterior, leading to a family of models called variational autoencoders (VAE) (Kingma & Welling, 2014; Rezende et al, 2014). Unfortunately VAEs for discrete text sequences can be challenging to train—for example, if the training procedure is not carefully tuned with techniques like word dropout and KL annealing (Bowman et al, 2016), the decoder simply becomes a language model and ignores the latent code. However there have been some recent successes through employing convolutional decoders (Yang et al, 2017; Semeniuta et al, 2017), training the latent representation as a topic model (Dieng et al, 2017; Wang et al, 2018), using the von Mises–Fisher distribution (Guu et al, 2017), and combining VAE with iterative inference (Kim et al, 2018). There has also been some work on making the prior more flexible through explicit parameterization (Chen et al, 2017; Tomczak & Welling, 2018). A notable technique is adversarial autoencoders (AAE) (Makhzani et al, 2015) which attempt to imbue the model with a more flexible prior implicitly through adversarial training. Recent work on Wasserstein autoencoders (Tolstikhin et al, 2018) provides a theoretical foundation for the AAE and shows that AAE minimizes the Wasserstein distance between the data/model distributions.
Funding
  • Yoon Kim was supported by a gift from Amazon AWS Machine Learning Research
Reference
  • Arjovsky, M. and Bottou, L. Towards Principled Methods for Training Generative Adversarial Networks. In Proceedings of ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein GAN. In Proceedings of ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Bowman, S. R., Angeli, G., Potts, C., and Manning., C. D. A large annotated corpus for learning natural language inference. In Proceedings of EMNLP, 2015.
    Google ScholarLocate open access versionFindings
  • Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. Generating Sentences from a Continuous Space. 2016.
    Google ScholarFindings
  • Che, T., Li, Y., Zhang, R., Hjelm, R. D., Li, W., Song, Y., and Bengio, Y. Maximum-Likelihood Augment Discrete Generative Adversarial Networks. arXiv:1702.07983, 2017.
    Findings
  • Chen, X., Kingma, D. P., Salimans, T., Duan, Y., Dhariwal, P., Schulman, J., Sutskever, I., and Abbeel, P. Variational Lossy Autoencoder. In Proceedings of ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Cífka, O., Severyn, A., Alfonseca, E., and Filippova, K. Eval all, trust a few, do wrong to none: Comparing sentence generation models. arXiv:1804.07972, 2018.
    Findings
  • Dai, A. M. and Le, Q. V. Semi-supervised sequence learning. In Proceedings of NIPS, 2015.
    Google ScholarLocate open access versionFindings
  • Denton, E. and Birodkar, V. Unsupervised learning of disentangled representations from video. In Proceedings of NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Dieng, A. B., Wang, C., Gao, J.,, and Paisley, J. TopicRNN: A Recurrent Neural Network With Long-Range Semantic Dependency. In Proceedings of ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Donahue, J., Krahenbühl, P., and Darrell, T. Adversarial Feature Learning. In Proceedings of ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Glynn, P. Likelihood Ratio Gradient Estimation: An Overview. In Proceedings of Winter Simulation Conference, 1987.
    Google ScholarLocate open access versionFindings
  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Proceedings of NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • Gozlan, N. and Léonard, C. Transport Inequalities. A Survey. arXiv:1003.3852, 2010.
    Findings
  • Gulrajani, I., Ahmed, F., Arjovsky, M., and Vincent Dumoulin, A. C. Improved Training of Wasserstein GANs. In Proceedings of NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Guu, K., Hashimoto, T. B., Oren, Y., and Liang, P. Generating Sentences by Editing Prototypes. arXiv:1709.08878, 2017.
    Findings
  • Hill, F., Cho, K., and Korhonen, A. Learning distributed representations of sentences from unlabelled data. In Proceedings of NAACL, 2016.
    Google ScholarLocate open access versionFindings
  • Hjelm, R. D., Jacob, A. P., Che, T., Cho, K., and Bengio, Y. Boundary-Seeking Generative Adversarial Networks. In Proceedings of ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., and Xing, E. P. Controllable Text Generation. In Proceedings of ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Hu, Z., Yang, Z., Salakhutdinov, R., and Xing, E. P. On Unifying Deep Generative Models. In Proceedings of ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Jang, E., Gu, S., and Poole, B. Categorical Reparameterization with Gumbel-Softmax. In Proceedings of ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. Bag of Tricks for Efficient Text Classification. In Proceedings of ACL, 2017.
    Google ScholarLocate open access versionFindings
  • Kim, Y., Wiseman, S., Miller, A. C., Sontag, D., and Rush, A. M. Semi-Amortized Variational Autoencoders. In Proceedings of ICML, 2018.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P. and Welling, M. Auto-Encoding Variational Bayes. In Proceedings of ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • Kusner, M. and Hernandez-Lobato, J. M. GANs for Sequences of Discrete Elements with the Gumbel-Softmax Distribution. arXiv:1611.04051, 2016.
    Findings
  • Lample, G., Zeghidour, N., Usuniera, N., Bordes, A., Denoyer, L., and Ranzato, M. Fader networks: Manipulating images by sliding attributes. In Proceedings of NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., and Jurafsky, D. Adversarial Learning for Neural Dialogue Generation. In Proceedings of EMNLP, 2017.
    Google ScholarLocate open access versionFindings
  • Li, J., Jia, R., He, H., and Liang, P. Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer. In Proceedings of NAACL, 2018.
    Google ScholarLocate open access versionFindings
  • Maddison, C. J., Mnih, A., and Teh, Y. W. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. In Proceedings of ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. Adversarial Autoencoders. arXiv:1511.05644, 2015.
    Findings
  • Mikolov, T., tau Yih, S. W., and Zweig, G. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL, 2013.
    Google ScholarLocate open access versionFindings
  • Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral Normalization For Generative Adversarial Networks. In Proceedings of ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Mueller, J., Gifford, D., and Jaakkola, T. Sequence to Better Sequence: Continuous Revision of Combinatorial Structures. In Proceedings of ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Prabhumoye, S., Tsvetkov, Y., Salakhutdinov, R., and Black, A. W. Style Transfer Through Back-Translation. In Proceedings of ACL, 2018.
    Google ScholarLocate open access versionFindings
  • Press, O., Bar, A., Bogin, B., Berant, J., and Wolf, L. Language Generation with Recurrent Generative Adversarial Networks without Pre-training. arXiv:1706.01399, 2017.
    Findings
  • Radford, A., Metz, L., and Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In Proceedings of ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Rajeswar, S., Subramanian, S., Dutil, F., Pal, C., and Courville, A. Adversarial Generation of Natural Language. arXiv:1705.10929, 2017.
    Findings
  • Rezende, D. J., Mohamed, S., and Wierstra, D. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In Proceedings of ICML, 2014.
    Google ScholarLocate open access versionFindings
  • Rifai, S., Vincent, P., Muller, X., Glorot, X., and Bengio, Y. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction. In Proceedings of ICML, 2011.
    Google ScholarLocate open access versionFindings
  • Semeniuta, S., Severyn, A., and Barth, E. A Hybrid Convolutional Variational Autoencoder for Text Generation. In Proceedings of EMNLP, 2017.
    Google ScholarLocate open access versionFindings
  • Shen, T., Lei, T., Barzilay, R., and Jaakkola, T. Style Transfer from Non-Parallel Text by Cross-Alignment. In Proceedings of NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Theis, L., van den Oord, A., and Bethge, M. A note on the evaluation of generative models. In Proceedings of ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Tolstikhin, I., Bousquet, O., Gelly, S., and Schoelkopf, B. Wasserstein Auto-Encoders. In Proceedings of ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Tomczak, J. M. and Welling, M. VAE with a VampPrior. In Proceedings of AISTATS, 2018.
    Google ScholarLocate open access versionFindings
  • Villani, C. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
    Google ScholarFindings
  • Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A. Extracting and Composing Robust Features with Denoising Autoencoders. In Proceedings of ICML, 2008.
    Google ScholarLocate open access versionFindings
  • Vincent Dumoulin, Ishmael Belghazi, B. P. O. M. A. L. M. A. A. C. Adversarially Learned Inference. In Proceedings of ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Wang, W., Gan, Z., Wang, W., Shen, D., Huang, J., Ping, W., Satheesh, S., and Carin, L. Topic Compositional Neural Language Model. In Proceedings of AISTATS, 2018.
    Google ScholarLocate open access versionFindings
  • Williams, R. J. Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8, 1992.
    Google ScholarLocate open access versionFindings
  • Yang, Z., Hu, Z., Salakhutdinov, R., and Berg-Kirkpatrick, T. Improved Variational Autoencoders for Text Modeling using Dilated Convolutions. In Proceedings of ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Yang, Z., Hu, Z., Dyer, C., Xing, E. P., and Berg-Kirkpatrick, T. Unsupervised Text Style Transfer using Language Models as Discriminators. arXiv:1805.11749, 2018.
    Findings
  • Yu, L., Zhang, W., Wang, J., and Yu, Y. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Proceedings of AAAI, 2017.
    Google ScholarLocate open access versionFindings
  • Zhang, X., Zhao, J., and LeCun, Y. Character-level Convolutional Networks for Text Classification. In Proceedings of NIPS, 2015.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments