AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
The effectiveness of Topic Modeling with Cycle-consistent Adversarial Training and sToMCAT is verified by experiments on topic modeling and text classification

Neural Topic Modeling with Cycle Consistent Adversarial Training

EMNLP 2020, pp.9018-9030, (2020)

Cited by: 0|Views133
Full Text
Bibtex
Weibo

Abstract

Advances on deep generative models have attracted significant research interest in neural topic modeling. The recently proposed Adversarial-neural Topic Model models topics with an adversarially trained generator network and employs Dirichlet prior to capture the semantic patterns in latent topics. It is effective in discovering coherent ...More

Code:

Data:

0
Introduction
  • Topic models, such as Latent Dirichlet Allocation (LDA) (Blei et al, 2003), aim to discover underlying topics and semantic structures from text collections.
  • Due to its interpretability and effectiveness, LDA has been extended to many Natural Language Processing (NLP) tasks (Lin and He, 2009; McAuley and Leskovec, 2013; Zhou et al, 2017)
  • Most of these models employ mean-field variational inference or collapsed Gibbs sampling (Griffiths and Steyvers, 2004) for model inference as a result of their intractable posteriors.
  • It is less capable of capturing the multi-modality which is crucial for topic modeling (Wallach et al, 2009)
Highlights
  • Topic models, such as Latent Dirichlet Allocation (LDA) (Blei et al, 2003), aim to discover underlying topics and semantic structures from text collections
  • Srivastava and Sutton (2017) adopted the logistic normal prior rather than Gaussian to mimic the simplex properties of topic distribution
  • A document labeled as ‘sports’ more likely belongs to topics such as ‘basketball’ or ‘football’ rather than ‘economics’ or ‘politics’. To address such limitations of Adversarial-neural Topic Model (ATM), we propose a novel neural topic modeling approach, named Topic Modeling with Cycle-consistent Adversarial Training (ToMCAT)
  • We report text classification results of supervised topic models : sLDA, Scholar, and sToMCAT
  • We have presented ToMCAT, a neural topic model with adversarial and cycle-consistent objectives, and its supervised extension, sToMCAT
  • The effectiveness of ToMCAT and sToMCAT is verified by experiments on topic modeling and text classification
Methods
  • Given a corpus D consisting of N documents {xi}Ni=1, two main purposes of topic modeling are: 1.
  • Topic discovery.
  • The authors can consider topic discovery as finding a mapping from topic distribution to word distribution.
  • Infer the topic distribution zj ∈ RK of the document xj ∈ RV.
  • The topic inference can be considered as finding a mapping from word distribution to topic distribution
Conclusion
  • The authors have presented ToMCAT, a neural topic model with adversarial and cycle-consistent objectives, and its supervised extension, sToMCAT.
  • SToMCAT further incorporates document labels into topic modeling.
  • The effectiveness of ToMCAT and sToMCAT is verified by experiments on topic modeling and text classification.
  • The authors plan to extend the model to cope with external word or document semantics.
  • It would be interesting to explore alternative architectures other than CycleGAN under the formulation of topic modeling
Summary
  • Introduction:

    Topic models, such as Latent Dirichlet Allocation (LDA) (Blei et al, 2003), aim to discover underlying topics and semantic structures from text collections.
  • Due to its interpretability and effectiveness, LDA has been extended to many Natural Language Processing (NLP) tasks (Lin and He, 2009; McAuley and Leskovec, 2013; Zhou et al, 2017)
  • Most of these models employ mean-field variational inference or collapsed Gibbs sampling (Griffiths and Steyvers, 2004) for model inference as a result of their intractable posteriors.
  • It is less capable of capturing the multi-modality which is crucial for topic modeling (Wallach et al, 2009)
  • Methods:

    Given a corpus D consisting of N documents {xi}Ni=1, two main purposes of topic modeling are: 1.
  • Topic discovery.
  • The authors can consider topic discovery as finding a mapping from topic distribution to word distribution.
  • Infer the topic distribution zj ∈ RK of the document xj ∈ RV.
  • The topic inference can be considered as finding a mapping from word distribution to topic distribution
  • Conclusion:

    The authors have presented ToMCAT, a neural topic model with adversarial and cycle-consistent objectives, and its supervised extension, sToMCAT.
  • SToMCAT further incorporates document labels into topic modeling.
  • The effectiveness of ToMCAT and sToMCAT is verified by experiments on topic modeling and text classification.
  • The authors plan to extend the model to cope with external word or document semantics.
  • It would be interesting to explore alternative architectures other than CycleGAN under the formulation of topic modeling
Tables
  • Table1: Dataset statistics
  • Table2: Average topic coherence of 5 topic number settings (20, 30, 50, 75, 100) on 4 datasets. Bold values indicate the best performing models for each dataset/metric/supervision setting. The supervised Scholar outperforms its unsupervised version on 20 Newsgroups but the unsupervised one achieves higher coherence scores on DBpedia. While sLDA fails to surpass its unsupervised counterpart LDA on both DBpedia and 20 Newsgroups. On the contrary, improvements of sToMCAT over the unsupervised ToMCAT can be observed under all settings. The results show that the incorporation of the supervised information seems to be more effective in our proposed model, probably contributing to the gradient-based loss balancing mechanism. Overall, our model consistently outperforms sLDA and Scholar on all datasets and all topic coherence measures
  • Table3: Table 3
  • Table4: Classification accuracy of supervised topic models with different topic numbers (20, 30, 50, 75, 100). ‘Min/Avg/Max’ shows the minimum/average/maximum accuracy among different topic numbers. ‘∆’ shows the variance of the classification accuracy across different topic numbers
  • Table5: Full list of 50 topics on NYTimes discovered by ToMCAT
  • Table6: Full list of 50 topics on NYTimes discovered by LDA
Download tables as Excel
Related work
  • Our work is related to neural topic modeling and unsupervised style transfer.

    2.1 Neural Topic Modeling

    Recent advances on deep generative models, such as VAEs (Kingma and Welling, 2013) and GANs (Goodfellow et al, 2014), attract much research interest in the NLP community.

    Based on VAE, Neural Variational Document Model (NVDM) (Miao et al, 2016) encodes documents with variational posteriors in the latent topic space. NVDM employs Gaussian as the prior distribution of latent topics. Instead, Srivastava and Sutton (2017) proposed that Dirichlet distribution is a more appropriate prior for multinomial topic distributions, and constructed a Laplace approximation of Dirichlet to enable reparameterisation (Kingma and Welling, 2013). Furthermore, the word-level mixture is replaced with a weighted product of experts (Srivastava and Sutton, 2017). Later, a non-parametric neural topic model utilizing stick-breaking construction was presented in (Miao et al, 2017). There are some attempts in incorporating supervised information into neural topic modeling. For example, Card et al (2018) extended the Sparse Additive Generative Model (Eisenstein et al, 2011) in the neural framework and incorporated document metadata such as document labels into the modeling process.
Funding
  • This work was funded in part by the National Key Research and Development Program of China (2016YFC1306704) and the National Natural Science Foundation of China (61772132)
Study subjects and analysis
datasets: 4
4.1 Experimental Setup. We evaluate the performance of proposed models on four datasets: NYTimes2 (NYT), Grolier3 (GRL), DBpedia ontology classification dataset (DBP) (Zhang et al, 2015) and 20 Newsgroups4 (20NG). For NYTimes and Grolier datasets, we use the processed version of (Wang et al, 2019a)

documents: 100000
For NYTimes and Grolier datasets, we use the processed version of (Wang et al, 2019a). For the DBpedia dataset, we first sample 100, 000 documents from the whole training set, and then perform preprocessing including tokenization, lemmatization, removal of stopwords, and low-frequency words. The same preprocessing is also applied to the 20 Newsgroups dataset

datasets: 4
4.2.3 Impact of Topic Numbers. To investigate how topic coherence scores vary with respect to different topic number settings, we show in Figure 2 the topic coherence measures on four datasets for all models. Although there are exceptions that some baselines achieve higher scores on specific experimental settings, the general conclusion is that our models perform the best in both unsupervised and supervised topic modeling tasks

Newsgroups datasets: 20
Although there are exceptions that some baselines achieve higher scores on specific experimental settings, the general conclusion is that our models perform the best in both unsupervised and supervised topic modeling tasks. On DBpedia and 20 Newsgroups datasets, sToMCAT consistently outperforms ToMCAT, indicating the additional supervision helps generate more coherent topics. We also notice that although the topic coherence measures of our models remain relatively stable across topic numbers, there are slight drops on the DBpedia and 20 Newsgroups datasets when the topic number becomes bigger

Newsgroups datasets: 20
On DBpedia and 20 Newsgroups datasets, sToMCAT consistently outperforms ToMCAT, indicating the additional supervision helps generate more coherent topics. We also notice that although the topic coherence measures of our models remain relatively stable across topic numbers, there are slight drops on the DBpedia and 20 Newsgroups datasets when the topic number becomes bigger. This phenomenon may result from the fact that DBpedia and 20 Newsgroups datasets are less diverse than others

Newsgroups datasets: 20
We also notice that although the topic coherence measures of our models remain relatively stable across topic numbers, there are slight drops on the DBpedia and 20 Newsgroups datasets when the topic number becomes bigger. This phenomenon may result from the fact that DBpedia and 20 Newsgroups datasets are less diverse than others. There are only 14 and 20 categories in DBpedia and 20 Newsgroups datasets, respectively

Newsgroups datasets: 20
This phenomenon may result from the fact that DBpedia and 20 Newsgroups datasets are less diverse than others. There are only 14 and 20 categories in DBpedia and 20 Newsgroups datasets, respectively. When the topic number is much larger than the ground-truth category number, discriminating different topics would be more challenging

datasets: 4
Dataset statistics. Average topic coherence of 5 topic number settings (20, 30, 50, 75, 100) on 4 datasets. Bold values indicate the best performing models for each dataset/metric/supervision setting. The supervised Scholar outperforms its unsupervised version on 20 Newsgroups but the unsupervised one achieves higher coherence scores on DBpedia. While sLDA fails to surpass its unsupervised counterpart LDA on both DBpedia and 20 Newsgroups. On the contrary, improvements of sToMCAT over the unsupervised ToMCAT can be observed under all settings. The results show that the incorporation of the supervised information seems to be more effective in our proposed model, probably contributing to the gradient-based loss balancing mechanism. Overall, our model consistently outperforms sLDA and Scholar on all datasets and all topic coherence measures. Table 3

datasets: 4
The framework of ToMCAT and sToMCAT. Circles are neural networks, squares are data representations, and arrows indicate the forward pass directions. Topic coherence (C A, C P, NPMI) w.r.t. topic numbers on 4 datasets. Dotted lines denote supervised topic models.

Reference
  • Nikolaos Aletras and Mark Stevenson. 2013. Evaluating topic coherence using distributional semantics. In Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers, pages 13–22, Potsdam, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Martin Arjovsky, Soumith Chintala, and Leon Bottou. 2017. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 214–223, Sydney, Australia. PMLR.
    Google ScholarLocate open access versionFindings
  • David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 200Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022.
    Google ScholarLocate open access versionFindings
  • G. Bouma. 2009. Normalized (pointwise) mautual information in collocation extraction. In From Form to Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009, volume Normalized, pages 31–40, Tubingen.
    Google ScholarLocate open access versionFindings
  • Dallas Card, Chenhao Tan, and Noah A. Smith. 2018. Neural models for documents with metadata. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2031–2040, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jeff Donahue, Philipp Krahenbuhl, and Trevor Darrell. 201Adversarial feature learning. arXiv preprint arXiv:1605.09782.
    Findings
  • Jacob Eisenstein, Amr Ahmed, and Eric P. Xing. 2011. Sparse additive generative models of text. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML11, page 10411048, Madison, WI, USA. Omnipress.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1):5228–5235.
    Google ScholarLocate open access versionFindings
  • Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems 30, pages 5767–5777. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages 448–456, Lille, France. PMLR.
    Google ScholarLocate open access versionFindings
  • Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 1857–1865, International Convention Centre, Sydney, Australia. PMLR.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Diederik P Kingma and Max Welling. 2013. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114.
    Findings
  • C. Lee, Y. Wang, T. Hsu, K. Chen, H. Lee, and L. Lee. 2018. Scalable sentiment for sequence-tosequence chatbot response with performance analysis. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6164–6168.
    Google ScholarLocate open access versionFindings
  • Chenghua Lin and Yulan He. 2009. Joint sentiment/topic model for sentiment analysis. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, pages 375–384, New York, NY, USA. ACM.
    Google ScholarLocate open access versionFindings
  • Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In ICML Workshop on Deep Learning for Audio, Speech and Language Processing. Citeseer.
    Google ScholarFindings
  • David J. C. MacKay. 1998. Choice of basis for laplace approximation. Mach. Learn., 33(1):7786.
    Google ScholarLocate open access versionFindings
  • Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics. In Proceedings of the 7th ACM conference on Recommender systems. ACM Press.
    Google ScholarLocate open access versionFindings
  • Jon D. Mcauliffe and David M. Blei. 2008. Supervised topic models. In Advances in Neural Information Processing Systems 20, pages 121–128. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Yishu Miao, Edward Grefenstette, and Phil Blunsom. 2017. Discovering discrete latent topics with neural variational inference. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 2410–2419, International Convention Centre, Sydney, Australia. PMLR.
    Google ScholarLocate open access versionFindings
  • Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In Proceedings of The 33rd International Conference on Machine Learning, volume 48, pages 1727–1736, New York, New York, USA. PMLR.
    Google ScholarLocate open access versionFindings
  • David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic evaluation of topic coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 100–108, Los Angeles, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Michael Roder, Andreas Both, and Alexander Hinneburg. 2015. Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM ’15, pages 399–408, New York, NY, USA. ACM.
    Google ScholarLocate open access versionFindings
  • Akash Srivastava and Charles Sutton. 2017. Autoencoding variational inference for topic models. arXiv preprint arXiv:1703.01488.
    Findings
  • Hanna M. Wallach, David M. Mimno, and Andrew McCallum. 2009. Rethinking lda: Why priors matter. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1973– 1981. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Rui Wang, Xuemeng Hu, Deyu Zhou, Yulan He, Yuxuan Xiong, Chenchen Ye, and Haiyang Xu. 2020. Neural topic modeling with bidirectional adversarial training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 340–350, Online. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rui Wang, Deyu Zhou, and Yulan He. 2019a. ATM: Adversarial-neural topic model. Information Processing & Management, 56(6):102098.
    Google ScholarLocate open access versionFindings
  • Rui Wang, Deyu Zhou, and Yulan He. 2019b. Open event extraction from online text using a generative adversarial network. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 282–291, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems 28, pages 649–657. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Deyu Zhou, Xuan Zhang, and Yulan He. 2017. Event extraction from twitter using non-parametric Bayesian mixture model with word embeddings. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 808–817, Valencia, Spain. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE.
    Google ScholarLocate open access versionFindings
Author
Xuemeng Hu
Xuemeng Hu
Yuxuan Xiong
Yuxuan Xiong
Your rating :
0

 

Tags
Comments
小科