# Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

ACL, pp. 187-195, 2013.

EI

Weibo:

Abstract:

Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions. We address these issues by: 1) introducing a regularization constant to ...More

Code:

Data:

Introduction

- As widely adopted in supervised latent Dirichlet allocation models (Blei and McAuliffe, 2010; Wang et al, 2009), one way to improve the predictive power of LDA is to define a likelihood model for the widely available documentlevel response variables, in addition to the likelihood model for document words.
- As noted by (Halpern et al, 2012) and observed in the experiments, this model imbalance could result in a weak influence of response variables on the topic representations and non-satisfactory prediction performance
- Another difficulty arises when dealing with categorical response variables is that the commonly used normal priors are no longer conjugate to the logistic likelihood and lead to hard inference problems.
- Existing approaches rely on variational approximation techniques which normally make strict mean-field assumptions

Highlights

- As widely adopted in supervised latent Dirichlet allocation models (Blei and McAuliffe, 2010; Wang et al, 2009), one way to improve the predictive power of LDA is to define a likelihood model for the widely available documentlevel response variables, in addition to the likelihood model for document words
- We present the generalized Bayesian logistic supervised topic models
- We present empirical results and sensitivity analysis to demonstrate the efficiency and prediction performance3 of the generalized logistic supervised topic models on the 20Newsgroups (20NG)
- We present two improvements to Bayesian logistic supervised topic models, namely, a general formulation by introducing a regularization parameter to avoid model imbalance and a highly efficient Gibbs sampling algorithm without restricting assumptions on the posterior distributions by exploring the idea of data augmentation
- The data augmentation technique can be applied to deal with other types of response variables, such as count data with a negative-binomial likelihood (Polson et al, 2012)

Methods

- The authors present empirical results and sensitivity analysis to demonstrate the efficiency and prediction performance3 of the generalized logistic supervised topic models on the 20Newsgroups (20NG)

data set, which contains about 20,000 postings within 20 news groups. - Following the same setting in (Lacoste-Jullien et al, 2009; Zhu et al, 2012), the task is to distinguish postings of the newsgroup alt.atheism and those of the group talk.religion.misc.
- GSLDA is insensitive to α, Accuracy Train−time Test−time gSLDA gSLDA+ vSLDA vMedLDA gMedLDA gLDA+SVM

Conclusion

- The authors present two improvements to Bayesian logistic supervised topic models, namely, a general formulation by introducing a regularization parameter to avoid model imbalance and a highly efficient Gibbs sampling algorithm without restricting assumptions on the posterior distributions by exploring the idea of data augmentation.
- The algorithm can be parallelized
- Empirical results for both binary and multi-class classification demonstrate significant improvements over the existing logistic supervised topic models.
- The data augmentation technique can be applied to deal with other types of response variables, such as count data with a negative-binomial likelihood (Polson et al, 2012)

Summary

## Introduction:

As widely adopted in supervised latent Dirichlet allocation models (Blei and McAuliffe, 2010; Wang et al, 2009), one way to improve the predictive power of LDA is to define a likelihood model for the widely available documentlevel response variables, in addition to the likelihood model for document words.- As noted by (Halpern et al, 2012) and observed in the experiments, this model imbalance could result in a weak influence of response variables on the topic representations and non-satisfactory prediction performance
- Another difficulty arises when dealing with categorical response variables is that the commonly used normal priors are no longer conjugate to the logistic likelihood and lead to hard inference problems.
- Existing approaches rely on variational approximation techniques which normally make strict mean-field assumptions
## Methods:

The authors present empirical results and sensitivity analysis to demonstrate the efficiency and prediction performance3 of the generalized logistic supervised topic models on the 20Newsgroups (20NG)

data set, which contains about 20,000 postings within 20 news groups.- Following the same setting in (Lacoste-Jullien et al, 2009; Zhu et al, 2012), the task is to distinguish postings of the newsgroup alt.atheism and those of the group talk.religion.misc.
- GSLDA is insensitive to α, Accuracy Train−time Test−time gSLDA gSLDA+ vSLDA vMedLDA gMedLDA gLDA+SVM
## Conclusion:

The authors present two improvements to Bayesian logistic supervised topic models, namely, a general formulation by introducing a regularization parameter to avoid model imbalance and a highly efficient Gibbs sampling algorithm without restricting assumptions on the posterior distributions by exploring the idea of data augmentation.- The algorithm can be parallelized
- Empirical results for both binary and multi-class classification demonstrate significant improvements over the existing logistic supervised topic models.
- The data augmentation technique can be applied to deal with other types of response variables, such as count data with a negative-binomial likelihood (Polson et al, 2012)

- Table1: Split of training time over various steps

Funding

- This work is supported by National Key Foundation R&D Projects (No.s 2013CB329403, 2012CB316301), Tsinghua Initiative Scientific Research Program No.20121088071, Tsinghua National Laboratory for Information Science and Technology, and the 221 Basic Research Plan for Young Faculties at Tsinghua University

Reference

- A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. 2012. Scalable inference in latent variable models. In International Conference on Web Search and Data Mining (WSDM).
- D.M. Blei and J.D. McAuliffe. 2010. Supervised topic models. arXiv:1003.0783v1.
- D.M. Blei, A.Y. Ng, and M.I. Jordan. 200Latent Dirichlet allocation. JMLR, 3:993–1022.
- M. Chen, J. Ibrahim, and C. Yiannoutsos. 1999. Prior elicitation, variable selection and Bayesian computation for logistic regression models. Journal of Royal Statistical Society, Ser. B, (61):223–242.
- P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. 2009. PAC-Bayesian learning of linear classifiers. In International Conference on Machine Learning (ICML), pages 353–360.
- A. Globerson, T. Koo, X. Carreras, and M. Collins. 2007. Exponentiated gradient algorithms for loglinear structured prediction. In ICML, pages 305– 312.
- J.E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. Powergraph: Distributed graphparallel computation on natural graphs. In the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI).
- T.L. Griffiths and M. Steyvers. 2004. Finding scientific topics. Proceedings of National Academy of Science (PNAS), pages 5228–5235.
- Y. Halpern, S. Horng, L. Nathanson, N. Shapiro, and D. Sontag. 2012. A comparison of dimensionality reduction techniques for unstructured clinical text. In ICML 2012 Workshop on Clinical Data Analysis.
- C. Holmes and L. Held. 2006. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1):145–168.
- Q. Jiang, J. Zhu, M. Sun, and E.P. Xing. 2012. Monte Carlo methods for maximum margin supervised topic models. In Advances in Neural Information Processing Systems (NIPS).
- T. Joachims. 1999. Making large-scale SVM learning practical. MIT press.
- S. Lacoste-Jullien, F. Sha, and M.I. Jordan. 2009. DiscLDA: Discriminative learning for dimensionality reduction and classification. Advances in Neural Information Processing Systems (NIPS), pages 897– 904.
- Y. Lin. 2001. A note on margin-based loss functions in classification. Technical Report No. 1044. University of Wisconsin.
- D. McAllester. 2003. PAC-Bayesian stochastic model selection. Machine Learning, 51:5–21.
- M. Meyer and P. Laud. 2002. Predictive variable selection in generalized linear models. Journal of American Statistical Association, 97(459):859–871.
- D. Newman, A. Asuncion, P. Smyth, and M. Welling. 2009. Distributed algorithms for topic models. Journal of Machine Learning Research (JMLR), (10):1801–1828.
- N.G. Polson, J.G. Scott, and J. Windle. 2012. Bayesian inference for logistic models using Polya-Gamma latent variables. arXiv:1205.0310v1.
- R. Rifkin and A. Klautau. 2004. In defense of onevs-all classification. Journal of Machine Learning Research (JMLR), (5):101–141.
- L. Rosasco, E. De Vito, A. Caponnetto, M. Piana, and A. Verri. 2004. Are loss functions all the same? Neural Computation, (16):1063–1076.
- A. Smola and S. Narayanamurthy. 2010. An architecture for parallel topic models. Very Large Data Base (VLDB), 3(1-2):703–710.
- M.A. Tanner and W.-H. Wong. 1987. The calculation of posterior distributions by data augmentation. Journal of the Americal Statistical Association (JASA), 82(398):528–540.
- D. van Dyk and X. Meng. 2001. The art of data augmentation. Journal of Computational and Graphical Statistics (JCGS), 10(1):1–50.
- C. Wang, D.M. Blei, and Li F.F. 2009. Simultaneous image classification and annotation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- J. Zhu, N. Chen, and E.P. Xing. 2011. Infinite latent SVM for classification and multi-task learning. In Advances in Neural Information Processing Systems (NIPS), pages 1620–1628.
- J. Zhu, A. Ahmed, and E.P. Xing. 2012. MedLDA: maximum margin supervised topic models. Journal of Machine Learning Research (JMLR), (13):2237– 2278.
- J. Zhu, N. Chen, H. Perkins, and B. Zhang. 2013a. Gibbs max-margin topic models with fast sampling algorithms. In International Conference on Machine Learning (ICML).
- J. Zhu, N. Chen, and E.P. Xing. 2013b. Bayesian inference with posterior regularization and applications to infinite latent svms. arXiv:1210.1766v2.

Tags

Comments