Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation

Arya D. McCarthy
Arya D. McCarthy

ACL, pp. 8512-8525, 2020.

Cited by: 0|Bibtex|Views38
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
This paper proposes a simple and effective approach to address the problem of posterior collapse in conditional variational autoencoders

Abstract:

This paper proposes a simple and effective approach to address the problem of posterior collapse in conditional variational autoencoders (CVAEs). It thus improves performance of machine translation models that use noisy or monolingual data, as well as in conventional settings. Extending Transformer and conditional VAEs, our proposed laten...More

Code:

Data:

0
Introduction
  • The conditional variational autoencoder (CVAE; Sohn et al, 2015) is a conditional generative model for structured prediction tasks like machine translation.
  • Variational inference for text generation often yields models that ignore their latent variables (Bowman et al, 2016), a phenomenon called posterior collapse.
  • The authors modify CVAE’s ELBO in two ways (§3): (1) The authors explicitly add a principled mutual information term back into the training objective, and (2) the authors use a factorized decoder (Chen et al, 2017), which predicts the target bagof-words as an auxiliary decoding distribution to regularize the latent variables.
Highlights
  • The conditional variational autoencoder (CVAE; Sohn et al, 2015) is a conditional generative model for structured prediction tasks like machine translation
  • We introduce a new loss function for conditional variational autoencoders that counteracts posterior collapse, motivated by our analysis of conditional variational autoencoders’s evidence lower bound objective (ELBO)
  • In applying our method to neural machine translation (NMT; Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014), we find that we have measurably mitigated posterior collapse
  • The results show that we have effectively addressed posterior collapse: latent variables are no longer ignored despite the presence of a powerful decoder
  • Ablation Study How do the different ingredients of our proposed approach contribute to preventing posterior collapse and improving translation quality? We explore two variants of the proposed model: 1) modified evidence lower bound objective only: only adding mutual information term to the training objective, while without gradients from LBoW, 2) BoW only: which is equivalent to DCVAE combined with BoW decoder
  • We have presented a conditional generative model with latent variables whose distribution is learned with variation inference, evaluated it in machine translation
Results
  • A sufficiently high-capacity autoregressive decoder can model the conditional density directly, ignoring the latent variable and reducing inference to Equation 1.
  • The posterior collapse of the baseline model is apparent: both DKL mutual information terms drop to 0 at the beginning of training as a result ELBO’s design.
  • To have a fair comparison, the authors extend VNMT and DCVAE with the same joint training algorithm, i.e., the newly added monolingual data is used to train their corresponding sequence encoder and inference network with standard VAE ELBO.
  • The authors explore two variants of the proposed model: 1) modified ELBO only: only adding mutual information term to the training objective, while without gradients from LBoW, 2) BoW only: which is equivalent to DCVAE combined with BoW decoder.
  • Discrete latent variables have been applied to NMT (Kaiser et al, 2017; Gu et al, 2018; Shen et al, 2019), without variational inference or addressing posterior collapse.
  • Eikema and Aziz (2019) present a generative model relying on autoencoding; they condition the source text x on the latent variable z.
  • Unlike most prior work in text generation, the authors tackle posterior collapse without requiring an annealing schedule (Bowman et al, 2016; Sønderby et al, 2016; Kim et al, 2018), a weakened decoder (Gulrajani et al, 2017), or a restricted variational family (Razavi et al, 2019).
  • Unlike Ma et al (2018), who employ bag-ofwords as an NMT objective, the BoW decoder only sees the latent variable z, not the encoder states.
Conclusion
  • Unlike Weng et al (2017), the generative decoder has access to both the latent variable and the encoder states; bag-of-words prediction is handled by separate parameters.
  • The authors have presented a conditional generative model with latent variables whose distribution is learned with variation inference, evaluated it in machine translation.
  • The authors' model has outperformed previous variational NMT models in terms of translation quality, and is comparable to non-latent Transformer on standard WMT Ro↔En and De↔En datasets.
Summary
  • The conditional variational autoencoder (CVAE; Sohn et al, 2015) is a conditional generative model for structured prediction tasks like machine translation.
  • Variational inference for text generation often yields models that ignore their latent variables (Bowman et al, 2016), a phenomenon called posterior collapse.
  • The authors modify CVAE’s ELBO in two ways (§3): (1) The authors explicitly add a principled mutual information term back into the training objective, and (2) the authors use a factorized decoder (Chen et al, 2017), which predicts the target bagof-words as an auxiliary decoding distribution to regularize the latent variables.
  • A sufficiently high-capacity autoregressive decoder can model the conditional density directly, ignoring the latent variable and reducing inference to Equation 1.
  • The posterior collapse of the baseline model is apparent: both DKL mutual information terms drop to 0 at the beginning of training as a result ELBO’s design.
  • To have a fair comparison, the authors extend VNMT and DCVAE with the same joint training algorithm, i.e., the newly added monolingual data is used to train their corresponding sequence encoder and inference network with standard VAE ELBO.
  • The authors explore two variants of the proposed model: 1) modified ELBO only: only adding mutual information term to the training objective, while without gradients from LBoW, 2) BoW only: which is equivalent to DCVAE combined with BoW decoder.
  • Discrete latent variables have been applied to NMT (Kaiser et al, 2017; Gu et al, 2018; Shen et al, 2019), without variational inference or addressing posterior collapse.
  • Eikema and Aziz (2019) present a generative model relying on autoencoding; they condition the source text x on the latent variable z.
  • Unlike most prior work in text generation, the authors tackle posterior collapse without requiring an annealing schedule (Bowman et al, 2016; Sønderby et al, 2016; Kim et al, 2018), a weakened decoder (Gulrajani et al, 2017), or a restricted variational family (Razavi et al, 2019).
  • Unlike Ma et al (2018), who employ bag-ofwords as an NMT objective, the BoW decoder only sees the latent variable z, not the encoder states.
  • Unlike Weng et al (2017), the generative decoder has access to both the latent variable and the encoder states; bag-of-words prediction is handled by separate parameters.
  • The authors have presented a conditional generative model with latent variables whose distribution is learned with variation inference, evaluated it in machine translation.
  • The authors' model has outperformed previous variational NMT models in terms of translation quality, and is comparable to non-latent Transformer on standard WMT Ro↔En and De↔En datasets.
Tables
  • Table1: Our model mitigates posterior collapse. The KL value refers to DKL(qφ(z | x, y) pθ(z | x)) for DCVAE and DKL(qφ(z | y) pθ(z | x)) for our model
  • Table2: BLEU score on WMT benchmarks. Best result on each dataset is in bold. Our model provides minor gains (≤ 0.5 points) over the standard Transformer, not degrading like VNMT and DCVAE. Alongside improvements in semi-supervised or noisy settings, this suggests that there is no BLEU compromise in choosing this model
  • Table3: Translation performance (BLEU) of utilizing source-side monolingual data. Best result on each data condition (with and without monolingual data) is bold
  • Table4: Ablation study on translation quality (BLEU). The information-infused loss function provides additional performance over the DCVAE with a bag-ofwords decoder
  • Table5: Translation examples from the baseline Transformer, VNMT, and our model. Disfluent words or absences are in red, and slightly incorrect lexical choice is in blue. Romanian diacritics have been stripped
Download tables as Excel
Related work
  • Unlike most prior work in (conditional) text generation, we tackle posterior collapse without requiring an annealing schedule (Bowman et al, 2016; Sønderby et al, 2016; Kim et al, 2018), a weakened decoder (Gulrajani et al, 2017), or a restricted variational family (Razavi et al, 2019).

    Unlike Ma et al (2018), who also employ bag-ofwords as an NMT objective, our BoW decoder only sees the latent variable z, not the encoder states. Conversely, unlike Weng et al (2017), our generative decoder has access to both the latent variable and the encoder states; bag-of-words prediction is handled by separate parameters.

    VNMT (Zhang et al, 2016) applies CVAE with Gaussian priors to conditional text generation. VRNMT (Su et al, 2018) extends VNMT, mod-

    We have presented a conditional generative model with latent variables whose distribution is learned with variation inference, then evaluated it in machine translation. Our approach does not require an annealing schedule or a hamstrung decoder to avoid posterior collapse. Instead, by providing a new analysis of the conditional VAE objective to improve it in a principled way and incorporating an auxiliary decoding objective, we measurably prevented posterior collapse.
Reference
  • Leon Bottou and Yann L. Cun. 2004. Large scale online learning. In Advances in Neural Information Processing Systems 16, pages 217–224. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pages 10–21, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Jennifer C. Lai, and Robert L. Mercer. 1992. An estimate of an upper bound for the entropy of English. Computational Linguistics, 18(1):31– 40.
    Google ScholarLocate open access versionFindings
  • Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2017. Variational lossy autoencoder. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
    Google ScholarLocate open access versionFindings
  • Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Semisupervised learning for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1965–1974, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thomas M. Cover and Joy A. Thomas. 200Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley, New York, NY, USA.
    Google ScholarFindings
  • Anna Currey, Antonio Valerio Miceli Barone, and Kenneth Heafield. 201Copied monolingual data improves low-resource neural machine translation. In Proceedings of the Second Conference on Machine Translation, pages 148–156, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Adji B. Dieng, Yoon Kim, Alexander M. Rush, and David M. Blei. 2019. Avoiding latent variable collapse with generative skip models. In Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pages 2397–2405.
    Google ScholarLocate open access versionFindings
  • Bryan Eikema and Wilker Aziz. 201Auto-encoding variational neural machine translation. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 124–141, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, and Richard Socher. 2018. Nonautoregressive neural machine translation. In 6th
    Google ScholarLocate open access versionFindings
  • International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
    Google ScholarLocate open access versionFindings
  • Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taıga, Francesco Visin, David Vazquez, and Aaron C. Courville. 2017. Pixelvae: A latent variable model for natural images. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
    Google ScholarLocate open access versionFindings
  • Francisco Guzman, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, and Marc’Aurelio Ranzato. 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali–English and Sinhala– English. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6098–6111, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Junxian He, Daniel Spokoyny, Graham Neubig, and Taylor Berg-Kirkpatrick. 2019. Lagging inference networks and posterior collapse in variational autoencoders. In ICLR.
    Google ScholarFindings
  • Matthew D. Hoffman and Matthew J. Johnson. 2016. ELBO surgery: yet another way to carve up the variational evidence lower bound. In Workshop in Advances in Approximate Bayesian Inference, volume 1.
    Google ScholarLocate open access versionFindings
  • Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with Gumbel–Softmax. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, and Jakob Uszkoreit. 20One model to learn them all. CoRR, abs/1706.05137v1.
    Findings
  • Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent convolutional neural networks for discourse compositionality. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, pages 119–126, Sofia, Bulgaria. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Huda Khayrallah and Philipp Koehn. 2018. On the impact of various types of noise on neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 74–83, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yoon Kim, Sam Wiseman, Andrew Miller, David Sontag, and Alexander Rush. 2018. Semi-amortized variational autoencoders. In Proceedings of the
    Google ScholarLocate open access versionFindings
  • 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2678–2687, Stockholmsmassan, Stockholm Sweden. PMLR.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Max Welling. 2014. Autoencoding variational bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Adam Lopez and Matt Post. 2013. Beyond bitext: Five open problems in machine translation. In Twenty Years of Bitext.
    Google ScholarFindings
  • Shuming Ma, Xu Sun, Yizhong Wang, and Junyang Lin. 2018. Bag-of-words as target for neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 332– 338, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. 2017. The concrete distribution: A continuous relaxation of discrete random variables. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Paul Michel and Graham Neubig. 2018. MTNT: A testbed for machine translation of noisy text. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 543– 553, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Radford M. Neal and Geoffrey E. Hinton. 1998. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Michael I. Jordan, editor, Learning in Graphical Models, pages 355–368. Springer Netherlands, Dordrecht.
    Google ScholarLocate open access versionFindings
  • Myle Ott, Michael Auli, David Grangier, and Marc’Aurelio Ranzato. 2018a. Analyzing uncertainty in neural machine translation. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmassan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 3953–3962. PMLR.
    Google ScholarLocate open access versionFindings
  • Myle Ott, Sergey Edunov, David Grangier, and Michael Auli. 2018b. Scaling neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 1–9, Belgium, Brussels. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186– 191, Belgium, Brussels. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ali Razavi, Aaron van den Oord, Ben Poole, and Oriol Vinyals. 2019. Preventing posterior collapse with delta-vaes. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
    Google ScholarFindings
  • Danilo Jimenez Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 1530–1538. JMLR.org.
    Google ScholarLocate open access versionFindings
  • Philip Schulz, Wilker Aziz, and Trevor Cohn. 2018. A stochastic decoder for neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1243–1252, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016a. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86–96, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016b. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715– 1725, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Harshil Shah and David Barber. 2018. Generative neural machine translation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 1346–1355. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Tianxiao Shen, Myle Ott, Michael Auli, and Marc’Aurelio Ranzato. 2019. Mixture models for diverse machine translation: Tricks of the trade. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 5719–5728. PMLR.
    Google ScholarLocate open access versionFindings
  • Jason R. Smith, Herve Saint-Amand, Magdalena Plamada, Philipp Koehn, Chris Callison-Burch, and Adam Lopez. 2013. Dirt cheap web-scale parallel text from the common crawl. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1374–1383, Sofia, Bulgaria. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 3483–3491. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. 2016. Ladder variational autoencoders. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 3738–3746. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, and Biao Zhang. 2018. Variational recurrent neural machine translation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 5488–5495. AAAI Press.
    Google ScholarLocate open access versionFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 3104–3112. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Christoph Tillmann and Hermann Ney. 2003. Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computational Linguistics, 29(1):97–133.
    Google ScholarLocate open access versionFindings
  • Jakub M. Tomczak and Max Welling. 2018. VAE with a VampPrior. In International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 911
    Google ScholarLocate open access versionFindings
  • April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, volume 84 of Proceedings of Machine Learning Research, pages 1214–1223. PMLR.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Rongxiang Weng, Shujian Huang, Zaixiang Zheng, Xinyu Dai, and Jiajun Chen. 2017. Neural machine translation with word predictions. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 136–145, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lijun Wu, Yiren Wang, Yingce Xia, Tao QIN, Jianhuang Lai, and Tie-Yan Liu. 2019. Exploiting monolingual data at scale for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4205– 4215, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hainan Xu and Philipp Koehn. 2017. Zipporah: a fast and scalable data cleaning system for noisy webcrawled parallel corpora. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2945–2950, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yilin Yang, Liang Huang, and Mingbo Ma. 2018a. Breaking the beam search curse: A study of (re)scoring methods and stopping criteria for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3054–3059, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen. 2018b. Breaking the softmax bottleneck: A high-rank RNN language model. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
    Google ScholarLocate open access versionFindings
  • Biao Zhang, Deyi Xiong, Jinsong Su, Qun Liu, Rongrong Ji, Hong Duan, and Min Zhang. 2016. Variational neural discourse relation recognizer. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 382– 391, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiajun Zhang and Chengqing Zong. 2016. Exploiting source-side monolingual data in neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1535–1545, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Shengjia Zhao, Jiaming Song, and Stefano Ermon. 2019. Infovae: Balancing learning and inference in variational autoencoders. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 5885–5892. AAAI Press.
    Google ScholarLocate open access versionFindings
  • Tiancheng Zhao, Kyusong Lee, and Maxine Eskenazi. 2018. Unsupervised discrete sentence representation learning for interpretable neural dialog generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1098–1107, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • To prove the decomposition of the conditional VAE’s regularization term into a mutual information term and a KL divergence term, we introduce a random variable representing an index into the training data; it uniquely identifies x( ), y( ). This alteration is “entirely algebraic” (Hoffman and Johnson, 2016) while making our process both more compact and more interpretable.
    Google ScholarLocate open access versionFindings
  • We define the marginals p(z) and q(z) as the aggregated posterior (Tomczak and Welling, 2018) and aggregated approximate posterior (Hoffman and Johnson, 2016). (This allows the independence assumption above.) Moving forward will require just a bit of information theory: the definitions of entropy and mutual information. For these, we direct the reader to the text of Cover and Thomas (2006).
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments