# Scalable Inference for Logistic-Normal Topic Models

NIPS, pp. 2445-2453, 2013.

EI

Weibo:

Abstract:

Logistic-normal topic models can effectively discover correlation structures among latent topics. However, their inference remains a challenge because of the non-conjugacy between the logistic-normal prior and multinomial topic mixing proportions. Existing algorithms either make restricting mean-field assumptions or are not scalable to la...More

Code:

Data:

Introduction

- In Bayesian models, though conjugate priors normally result in easier inference problems, nonconjugate priors could be more expressive in capturing desired model properties.
- One elegant extension of LDA is the logistic-normal topic models [3], which use a logisticnormal prior to capture the correlation structures among topics effectively
- Along this line, many subsequent extensions have been developed, including dynamic topic models [4] that deal with time series via a dynamic linear system on the Gaussian variables and infinite CTMs [11] that can resolve the number of topics from data

Highlights

- In Bayesian models, though conjugate priors normally result in easier inference problems, nonconjugate priors could be more expressive in capturing desired model properties
- For the most popular latent Dirichlet allocation (LDA) [5], a Dirichlet distribution is used as the conjugate prior for multinomial mixing proportions
- Many subsequent extensions have been developed, including dynamic topic models [4] that deal with time series via a dynamic linear system on the Gaussian variables and infinite CTMs [11] that can resolve the number of topics from data
- To address the limitations listed above, we develop a scalable Gibbs sampling algorithm for logisticnormal topic models, without making any restricting assumptions on the posterior distribution
- We present a block-wise Gibbs sampling algorithm for logistic-normal topic models

Methods

- The authors present qualitative and quantitative evaluation to demonstrate the efficacy and scalability of the Gibbs sampler for CTM.
- Experiments are conducted on a 40-node cluster, where each node is equipped with two 6-core CPUs (2.93GHz).
- The authors will use M to denote the number of machines and P to denote the number of CPU cores.
- The authors compare with the variational CTM [3] and the state-of-the-art LDA implementation, Yahoo!
- In order to achieve fair comparison, for both vCTM and gCTM the authors select T such that the models converge sufficiently, as the authors shall discuss later in Section 5.3

Conclusion

- The authors present a scalable Gibbs sampling algorithm for logistic-normal topic models.
- The authors' method builds on a novel data augmentation formulation and addresses the non-conjugacy without making strict mean-field assumptions.
- The authors' method enjoys good scalability, suggesting the ability to extract large structures from massive data.
- The authors are interested in developing scalable sampling algorithms of other logistic-normal topic models, e.g., infinite CTM and dynamic topic models.
- The fast sampler of Poly-Gamma distributions can be used in relational and supervised topic models [6, 21]

Summary

## Introduction:

In Bayesian models, though conjugate priors normally result in easier inference problems, nonconjugate priors could be more expressive in capturing desired model properties.- One elegant extension of LDA is the logistic-normal topic models [3], which use a logisticnormal prior to capture the correlation structures among topics effectively
- Along this line, many subsequent extensions have been developed, including dynamic topic models [4] that deal with time series via a dynamic linear system on the Gaussian variables and infinite CTMs [11] that can resolve the number of topics from data
## Methods:

The authors present qualitative and quantitative evaluation to demonstrate the efficacy and scalability of the Gibbs sampler for CTM.- Experiments are conducted on a 40-node cluster, where each node is equipped with two 6-core CPUs (2.93GHz).
- The authors will use M to denote the number of machines and P to denote the number of CPU cores.
- The authors compare with the variational CTM [3] and the state-of-the-art LDA implementation, Yahoo!
- In order to achieve fair comparison, for both vCTM and gCTM the authors select T such that the models converge sufficiently, as the authors shall discuss later in Section 5.3
## Conclusion:

The authors present a scalable Gibbs sampling algorithm for logistic-normal topic models.- The authors' method builds on a novel data augmentation formulation and addresses the non-conjugacy without making strict mean-field assumptions.
- The authors' method enjoys good scalability, suggesting the ability to extract large structures from massive data.
- The authors are interested in developing scalable sampling algorithms of other logistic-normal topic models, e.g., infinite CTM and dynamic topic models.
- The fast sampler of Poly-Gamma distributions can be used in relational and supervised topic models [6, 21]

- Table1: Training time of vCTM and gCTM (M = 40) on various datasets

Funding

- This work is supported by the National Basic Research Program (973 Program) of China (Nos. 2013CB329403, 2012CB316301), National Natural Science Foundation of China (Nos. 61322308, 61305066), Tsinghua University Initiative Scientific Research Program (No 20121088071), and Tsinghua National Laboratory for Information Science and Technology, China

Reference

- A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. Scalable inference in latent variable models. In International Conference on Web Search and Data Mining (WSDM), 2012.
- K. Bache and M. Lichman. UCI machine learning repository, 2013.
- D. Blei and J. Lafferty. Correlated topic models. In Advances in Neural Information Processing Systems (NIPS), 2006.
- D. Blei and J. Lafferty. Dynamic topic models. In International Conference on Machine Learning (ICML), 2006.
- D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
- N. Chen, J. Zhu, F. Xia, and B. Zhang. Generalized relational topic models with data augmentation. In International Joint Conference on Artificial Intelligence (IJCAI), 2013.
- M. Hoffman, D. Blei, and F. Bach. Online learning for latent Dirichlet allocation. In Advances in Neural Information Processing Systems (NIPS), 2010.
- C. Holmes and L. Held. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1):145–168, 2006.
- D. Mimno, H. Wallach, and A. McCallum. Gibbs sampling for logistic normal topic models with graph-based priors. In NIPS Workshop on Analyzing Graphs, 2008.
- D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. Journal of Machine Learning Research, (10):1801–1828, 2009.
- J. Paisley, C. Wang, and D. Blei. The discrete infinite logistic normal distribution for mixedmembership modeling. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
- N. G. Polson and J. G. Scott. Default bayesian analysis for multi-way tables: a dataaugmentation approach. arXiv:1109.4180, 2011.
- N. G. Polson, J. G. Scott, and J. Windle. Bayesian inference for logistic models using PolyaGamma latent variables. arXiv:1205.0310v2, 2013.
- C. P. Robert. Simulation of truncated normal variables. Statistics and Compuating, 5:121–125, 1995.
- W. Rudin. Principles of mathematical analysis. McGraw-Hill Book Co., 1964.
- A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Very Large Data Base (VLDB), 2010.
- M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data augmentation. Journal of the Americal Statistical Association, 82(398):528–540, 1987.
- D. van Dyk and X. Meng. The art of data augmentation. Journal of Computational and Graphical Statistics, 10(1):1–50, 2001.
- L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In International Conference on Knowledge Discovery and Data mining (SIGKDD), 2009.
- A. Zhang, J. Zhu, and B. Zhang. Sparse online topic models. In International Conference on World Wide Web (WWW), 2013.
- J. Zhu, X. Zheng, and B. Zhang. Improved bayesian supervised topic models with data augmentation. In Annual Meeting of the Association for Computational Linguistics (ACL), 2013.

Tags

Comments