AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
A Discrete Variational Recurrent Topic Model Without The Reparametrization Trick
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), (2020)
- With the successes of deep learning models, neural variational inference (NVI) — called variational autoencoders (VAE)—has emerged as an important tool for neural-based, probabilistic modeling [19, 40, 37].
- With its ability to effectively cluster words into thematically-similar groups, has a rich history in NLP and semantics-oriented applications [4, 3, 50, 6, 46, 35, i.a.]
- They can struggle to capture shorter-range dependencies among the words in a document .
- Two common ways of learning an LDA model are either through Monte Carlo sampling techniques, that iteratively sample states for the latent random variables in the model, or variational/EM-based methods, which minimize the distribution distance between the posterior p(θ, z|w) and an approximation q(θ, z; γ, φ) to that posterior that is controlled by learnable parameters γ and φ.
- The authors focus on variational inference as it cleanly allows neural components to be used
- With the successes of deep learning models, neural variational inference (NVI) — called variational autoencoders (VAE)—has emerged as an important tool for neural-based, probabilistic modeling [19, 40, 37]
- NVI is relatively straight-forward when dealing with continuous random variables, but necessitates more complicated approaches for discrete random variables
- Setup Following Dieng et al , we find that, for basic language modeling and core topic modeling, identifying which words are/are not stopwords is a sufficient indicator of thematic relevance
- The model used in this paper is fundamentally an associative-based language model
- While NVI does provide some degree of regularization, a significant component of the training criteria is still a cross-entropy loss
- The text the model is trained on can influence the types of implicit biases that are transmitted to the learned syntactic component, the learned thematic component, and the tradeoff(s) between these two dynamics
- BNC (a) Test perplexity for different RNN cells and (b) Test perplexity, as reported in previous works.6.
- T denotes the denotes the number of topics.
- Consistent with Wang number of topics.
- Et al  the authors report the maximum of three VRTM runs.
- LDA VRTM (a) SwitchP for VRTM vs LDA VB  (b) Average document-level topic θ entropy, across averaged across three runs.
- Lower entropy means a a corpus and for the same number of topics
- Datasets The authors test the performance of the algorithm on the APNEWS, IMDB and BNC datasets that are publicly available.3 Roughly, there are between 7.7k and 9.8k vocab words in each corpus, with between 15M and 20M training tokens each; Table A1 in the appendix details the statistics of these datasets.
- APNEWS contains 54k newswire articles, IMDB contains 100k movie reviews, and BNC contains 17k assorted texts, such as journals, books excerpts, and newswire.
- These are the same datasets including the train, validation and test splits, as used by prior work, where additional details can be found .
- Following previous work , to avoid overfitting on the BNC dataset the authors grouped 10 documents in the training set into a new pseudo-document
- The authors incorporated discrete variables into neural variational without analytically integrating them out or reparametrizing and running stochastic backpropagation on them.
- Neural topic model, the approach maintains the discrete topic assignments, yielding a simple yet effective way to learn thematic vs non-thematic word dynamics.
- While NVI does provide some degree of regularization, a significant component of the training criteria is still a cross-entropy loss.
- This paper’s model does not examine adjusting this cross-entropy component.
- Note that the thematic vs non-thematic aspect of this work provides a potential avenue for examining this.
- While the authors treated lt as a binary indicator, future work could involve a more nuanced, gradient view
- Table1: Test set perplexity (lower is better) of VRTM demonstrates the effectiveness of our approach at learning a topic-based language model. In 1a we demonstrate the stability of VRTM using different recurrent cells. In 1b, we demonstrate our VRTM-LSTM model outperforms prior neural topic models. We do not use pretrained word embeddings
- Table2: We provide both SwitchP [<a class="ref-link" id="c24" href="#r24">24</a>] results and entropy analysis of the model. These results support the idea that if topic models capture semantic dependencies, then they should capture the topics well, explain the topic assignment for each word, and provide an overall level of thematic consistency across the document (lower θ entropy)
- Table3: Nine random topics extracted from a 50 topic VRTM learned on the APNEWS corpus. See Table A2 in the Appendix for topics from IMBD and BNC
- Table4: Seven randomly generated sentences from a VRTM model learned on the three corpora
- Table5: A summary of the datasets used in our experiments. We use the same datasets and splits as in previous work [<a class="ref-link" id="c49" href="#r49">49</a>]
- Table6: Nine random topics extracted from a 50 topic VRTM learned on the APNEWS, IMDB and BNC corpora
- This material is based in part upon work supported by the National Science Foundation under Grant No IIS-1940931
- This material is also based on research that is in part supported by the Air Force Research Laboratory (AFRL), DARPA, for the KAIROS program under agreement number FA8750-19-2-1003
- Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi, and Kiran Vodrahalli. A compressed sensing view of unsupervised text embeddings, bag-of-n-grams, and lstms. In International Conference on Learning Representations, 2018.
- Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan, and Sam Gershman. Nonparametric spherical topic modeling with word embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2016.
- David M Blei and John D Lafferty. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, pages 113–120. ACM, 2006.
- David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003.
- Shammur Absar Chowdhury and Roberto Zamparelli. RNN simulations of grammaticality judgments on long-distance dependencies. In Proceedings of the 27th International Conference on Computational Linguistics, 2018.
- Steven P Crain, Shuang-Hong Yang, Hongyuan Zha, and Yu Jiao. Dialect topic modeling for improved consumer medical search. In AMIA Annual Symposium Proceedings, volume 2010, page 132. American Medical Informatics Association, 2010.
- Rajarshi Das, Manzil Zaheer, and Chris Dyer. Gaussian LDA for topic models with word embeddings. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015.
- Adji B. Dieng, Chong Wang, Jianfeng Gao, and John W. Paisley. Topicrnn: A recurrent neural network with long-range semantic dependency. In 5th International Conference on Learning Representations, ICLR, 2017.
- Jacob Eisenstein, Amr Adel Hassan Ahmed, and Eric P. Xing. Sparse additive generative models of text. In ICML, 2011.
- Francis Ferraro. Unsupervised Induction of Frame-Based Linguistic Forms. PhD thesis, Johns Hopkins University, 2017.
- Olivier Ferret. How to thematically segment texts by using lexical cohesion? In 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2, 1998.
- Mikhail Figurnov, Shakir Mohamed, and Andriy Mnih. Implicit reparameterization gradients. In Advances in Neural Information Processing Systems, pages 441–452, 2018.
- Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tieyan Liu. Representation degeneration problem in training natural language generation models. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SkEYojRqtm.
- Pankaj Gupta, Yatin Chaudhary, Florian Buettner, and Hinrich Schütze. Texttovec: Deep contextualized neural autoregressive topic models of language with distributed compositional prior. arXiv preprint arXiv:1810.03947, 2018.
- Suchin Gururangan, Tam Dang, Dallas Card, and Noah A. Smith. Variational pretraining for semi-supervised text classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019.
- Eva Hajicová and Jirí Mírovský. Discourse coherence through the lens of an annotated text corpus: A case study. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 2018. European Language Resources Association (ELRA). URL https://www.aclweb.org/anthology/L18-1259.
- Geoffrey E Hinton and Ruslan R Salakhutdinov. Replicated softmax: an undirected topic model. In Advances in neural information processing systems, pages 1607–1614, 2009.
- Weonyoung Joo, Wonsung Lee, Sungrae Park,, and Il-Chul Moon. Dirichlet variational autoencoder, 2019. URL https://openreview.net/forum?id=rkgsvoA9K7.
- Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Yair Lakretz, German Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, and Marco Baroni. The emergence of number and syntax units in LSTM language models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2019. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/N19-1002.
- Hugo Larochelle and Stanislas Lauly. A neural autoregressive topic model. In Advances in Neural Information Processing Systems, pages 2708–2716, 2012.
- Jey Han Lau, Timothy Baldwin, and Trevor Cohn. Topically driven neural language model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 355–365, 2017.
- Shuangyin Li, Yu Zhang, Rong Pan, Mingzhi Mao, and Yang Yang. Recurrent attentional topic model. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
- Jeffrey Lund, Piper Armstrong, Wilson Fearn, Stephen Cowley, Courtni Byun, Jordan L. BoydGraber, and Kevin D. Seppi. Automatic evaluation of local topic quality. In ACL, 2019.
- Chandler May, Francis Ferraro, Alan McCree, Jonathan Wintrode, Daniel Garcia-Romero, and Benjamin Van Durme. Topic identification and discovery on text and speech. In EMNLP, 2015.
- Paola Merlo and Suzanne Stevenson. Automatic verb classification based on statistical distributions of argument structure. Computational Linguistics, 27(3):373–408, 2001.
- Yishu Miao, Lei Yu, and Phil Blunsom. Neural variational inference for text processing. In International conference on machine learning, pages 1727–1736, 2016.
- Tomas Mikolov and Geoffrey Zweig. Context dependent recurrent neural network language model. In 2012 IEEE Spoken Language Technology Workshop (SLT), pages 234–239. IEEE, 2012.
- Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, and Marc’Aurelio Ranzato. Learning longer memory in recurrent neural networks. arXiv preprint arXiv:1412.7753, 2014.
- David Mimno, Hanna M Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. Optimizing semantic coherence in topic models. In EMNLP, 2011.
- Andriy Mnih and Karol Gregor. Neural variational inference and learning in belief networks. In Proceedings of the 31st International Conference on International Conference on Machine Learning-Volume 32, pages II–1791. JMLR. org, 2014.
- Seyedahmad Mousavi, Mehdi Rezaee, Ramin Ayanzadeh, et al. A survey on compressive sensing: Classical results and recent advancements. arXiv: 1908.01014, 2019.
- Christian A Naesseth, Francisco JR Ruiz, Scott W Linderman, and David M Blei. Reparameterization gradients through acceptance-rejection sampling algorithms. arXiv preprint arXiv:1610.05683, 2016.
- Ramesh Nallapati, Igor Melnyk, Abhishek Kumar, and Bowen Zhou. Sengen: Sentence generating neural variational topic model. CoRR, abs/1708.00308, 2017. URL http://arxiv.org/abs/1708.00308.
- Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, David Lo, and Chengnian Sun. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pages 70–79. ACM, 2012.
- Michael John Paul. Topic Modeling with Structured Priors for Text-Driven Science. PhD thesis, Johns Hopkins University, 2015.
- Rajesh Ranganath, Sean Gerrish, and David Blei. Black box variational inference. In Artificial Intelligence and Statistics, pages 814–822, 2014.
- Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J Mooney. Spherical topic models. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 903–910, 2010.
- Mehdi Rezaee, Francis Ferraro, et al. Event representation with sequential, semi-supervised discrete variables. arXiv: 2010.04361, 2020.
- Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, pages 1278–1286, 2014.
- Pannawit Samatthiyadikun and Atsuhiro Takasu. Supervised deep polylingual topic modeling for scholarly information recommendations. In ICPRAM, pages 196–201, 2018.
- Akash Srivastava and Charles A. Sutton. Autoencoding variational inference for topic models. In ICLR, 2017.
- Nitish Srivastava, Ruslan Salakhutdinov, and Geoffrey Hinton. Modeling documents with a deep boltzmann machine. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI’13, pages 616–624, Arlington, Virginia, United States, 2013. AUAI Press. URL http://dl.acm.org/citation.cfm?id=3023638.3023701.
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014.
- Shawn Tan and Khe Chai Sim. Learning utterance-level normalisation using variational autoencoders for robust automatic speech recognition. In 2016 IEEE Spoken Language Technology Workshop (SLT), pages 43–49. IEEE, 2016.
- Mirwaes Wahabzada, Anne-Katrin Mahlein, Christian Bauckhage, Ulrike Steiner, ErichChristian Oerke, and Kristian Kersting. Plant phenotyping using probabilistic topic models: uncovering the hyperspectral language of plants. Scientific reports, 6:22482, 2016.
- Hanna M Wallach, David M Mimno, and Andrew McCallum. Rethinking lda: Why priors matter. In Advances in neural information processing systems, pages 1973–1981, 2009.
- Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, Sanjeev Satheesh, and Lawrence Carin. Topic compositional neural language model. In AISTATS, 2017.
- Wenlin Wang, Zhe Gan, Hongteng Xu, Ruiyi Zhang, Guoyin Wang, Dinghan Shen, Changyou Chen, and Lawrence Carin. Topic-guided variational auto-encoder for text generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 166–177, 2019.
- Xiaogang Wang. Action recognition using topic models. In Visual Analysis of Humans, pages 311–332.
- Tsung-Hsien Wen and Minh-Thang Luong. Latent topic conversational models. arXiv preprint arXiv:1809.07070, 2018.
- Yinfei Yang, Forrest Bao, and Ani Nenkova. Detecting (un)important content for singledocument news summarization. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017.
- Hao Zhang, Bo Chen, Dandan Guo, and Mingyuan Zhou. Whai: Weibull hybrid autoencoding inference for deep topic modeling. In ICLR, 2018.
- 0. Also Lθ = 0, since it is the KL divergence between two equal distributions. Overall, L reduces to log p(wt; ht), indicating that the model is just maximizing the log-model evidence based on the RNN output.