Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

ACL, pp. 187-195, 2013.

Cited by: 18|Views25
EI
Weibo:
We present the generalized Bayesian logistic supervised topic models

Abstract:

Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions. We address these issues by: 1) introducing a regularization constant to ...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • As widely adopted in supervised latent Dirichlet allocation models (Blei and McAuliffe, 2010; Wang et al, 2009), one way to improve the predictive power of LDA is to define a likelihood model for the widely available documentlevel response variables, in addition to the likelihood model for document words.
  • As noted by (Halpern et al, 2012) and observed in the experiments, this model imbalance could result in a weak influence of response variables on the topic representations and non-satisfactory prediction performance
  • Another difficulty arises when dealing with categorical response variables is that the commonly used normal priors are no longer conjugate to the logistic likelihood and lead to hard inference problems.
  • Existing approaches rely on variational approximation techniques which normally make strict mean-field assumptions
Highlights
  • As widely adopted in supervised latent Dirichlet allocation models (Blei and McAuliffe, 2010; Wang et al, 2009), one way to improve the predictive power of LDA is to define a likelihood model for the widely available documentlevel response variables, in addition to the likelihood model for document words
  • We present the generalized Bayesian logistic supervised topic models
  • We present empirical results and sensitivity analysis to demonstrate the efficiency and prediction performance3 of the generalized logistic supervised topic models on the 20Newsgroups (20NG)
  • We present two improvements to Bayesian logistic supervised topic models, namely, a general formulation by introducing a regularization parameter to avoid model imbalance and a highly efficient Gibbs sampling algorithm without restricting assumptions on the posterior distributions by exploring the idea of data augmentation
  • The data augmentation technique can be applied to deal with other types of response variables, such as count data with a negative-binomial likelihood (Polson et al, 2012)
Methods
  • The authors present empirical results and sensitivity analysis to demonstrate the efficiency and prediction performance3 of the generalized logistic supervised topic models on the 20Newsgroups (20NG)

    data set, which contains about 20,000 postings within 20 news groups.
  • Following the same setting in (Lacoste-Jullien et al, 2009; Zhu et al, 2012), the task is to distinguish postings of the newsgroup alt.atheism and those of the group talk.religion.misc.
  • GSLDA is insensitive to α, Accuracy Train−time Test−time gSLDA gSLDA+ vSLDA vMedLDA gMedLDA gLDA+SVM
Conclusion
  • The authors present two improvements to Bayesian logistic supervised topic models, namely, a general formulation by introducing a regularization parameter to avoid model imbalance and a highly efficient Gibbs sampling algorithm without restricting assumptions on the posterior distributions by exploring the idea of data augmentation.
  • The algorithm can be parallelized
  • Empirical results for both binary and multi-class classification demonstrate significant improvements over the existing logistic supervised topic models.
  • The data augmentation technique can be applied to deal with other types of response variables, such as count data with a negative-binomial likelihood (Polson et al, 2012)
Summary
  • Introduction:

    As widely adopted in supervised latent Dirichlet allocation models (Blei and McAuliffe, 2010; Wang et al, 2009), one way to improve the predictive power of LDA is to define a likelihood model for the widely available documentlevel response variables, in addition to the likelihood model for document words.
  • As noted by (Halpern et al, 2012) and observed in the experiments, this model imbalance could result in a weak influence of response variables on the topic representations and non-satisfactory prediction performance
  • Another difficulty arises when dealing with categorical response variables is that the commonly used normal priors are no longer conjugate to the logistic likelihood and lead to hard inference problems.
  • Existing approaches rely on variational approximation techniques which normally make strict mean-field assumptions
  • Methods:

    The authors present empirical results and sensitivity analysis to demonstrate the efficiency and prediction performance3 of the generalized logistic supervised topic models on the 20Newsgroups (20NG)

    data set, which contains about 20,000 postings within 20 news groups.
  • Following the same setting in (Lacoste-Jullien et al, 2009; Zhu et al, 2012), the task is to distinguish postings of the newsgroup alt.atheism and those of the group talk.religion.misc.
  • GSLDA is insensitive to α, Accuracy Train−time Test−time gSLDA gSLDA+ vSLDA vMedLDA gMedLDA gLDA+SVM
  • Conclusion:

    The authors present two improvements to Bayesian logistic supervised topic models, namely, a general formulation by introducing a regularization parameter to avoid model imbalance and a highly efficient Gibbs sampling algorithm without restricting assumptions on the posterior distributions by exploring the idea of data augmentation.
  • The algorithm can be parallelized
  • Empirical results for both binary and multi-class classification demonstrate significant improvements over the existing logistic supervised topic models.
  • The data augmentation technique can be applied to deal with other types of response variables, such as count data with a negative-binomial likelihood (Polson et al, 2012)
Tables
  • Table1: Split of training time over various steps
Download tables as Excel
Funding
  • This work is supported by National Key Foundation R&D Projects (No.s 2013CB329403, 2012CB316301), Tsinghua Initiative Scientific Research Program No.20121088071, Tsinghua National Laboratory for Information Science and Technology, and the 221 Basic Research Plan for Young Faculties at Tsinghua University
Reference
  • A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. 2012. Scalable inference in latent variable models. In International Conference on Web Search and Data Mining (WSDM).
    Google ScholarFindings
  • D.M. Blei and J.D. McAuliffe. 2010. Supervised topic models. arXiv:1003.0783v1.
    Findings
  • D.M. Blei, A.Y. Ng, and M.I. Jordan. 200Latent Dirichlet allocation. JMLR, 3:993–1022.
    Google ScholarLocate open access versionFindings
  • M. Chen, J. Ibrahim, and C. Yiannoutsos. 1999. Prior elicitation, variable selection and Bayesian computation for logistic regression models. Journal of Royal Statistical Society, Ser. B, (61):223–242.
    Google ScholarLocate open access versionFindings
  • P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. 2009. PAC-Bayesian learning of linear classifiers. In International Conference on Machine Learning (ICML), pages 353–360.
    Google ScholarLocate open access versionFindings
  • A. Globerson, T. Koo, X. Carreras, and M. Collins. 2007. Exponentiated gradient algorithms for loglinear structured prediction. In ICML, pages 305– 312.
    Google ScholarLocate open access versionFindings
  • J.E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. Powergraph: Distributed graphparallel computation on natural graphs. In the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI).
    Google ScholarLocate open access versionFindings
  • T.L. Griffiths and M. Steyvers. 2004. Finding scientific topics. Proceedings of National Academy of Science (PNAS), pages 5228–5235.
    Google ScholarLocate open access versionFindings
  • Y. Halpern, S. Horng, L. Nathanson, N. Shapiro, and D. Sontag. 2012. A comparison of dimensionality reduction techniques for unstructured clinical text. In ICML 2012 Workshop on Clinical Data Analysis.
    Google ScholarFindings
  • C. Holmes and L. Held. 2006. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1):145–168.
    Google ScholarLocate open access versionFindings
  • Q. Jiang, J. Zhu, M. Sun, and E.P. Xing. 2012. Monte Carlo methods for maximum margin supervised topic models. In Advances in Neural Information Processing Systems (NIPS).
    Google ScholarLocate open access versionFindings
  • T. Joachims. 1999. Making large-scale SVM learning practical. MIT press.
    Google ScholarFindings
  • S. Lacoste-Jullien, F. Sha, and M.I. Jordan. 2009. DiscLDA: Discriminative learning for dimensionality reduction and classification. Advances in Neural Information Processing Systems (NIPS), pages 897– 904.
    Google ScholarLocate open access versionFindings
  • Y. Lin. 2001. A note on margin-based loss functions in classification. Technical Report No. 1044. University of Wisconsin.
    Google ScholarFindings
  • D. McAllester. 2003. PAC-Bayesian stochastic model selection. Machine Learning, 51:5–21.
    Google ScholarLocate open access versionFindings
  • M. Meyer and P. Laud. 2002. Predictive variable selection in generalized linear models. Journal of American Statistical Association, 97(459):859–871.
    Google ScholarLocate open access versionFindings
  • D. Newman, A. Asuncion, P. Smyth, and M. Welling. 2009. Distributed algorithms for topic models. Journal of Machine Learning Research (JMLR), (10):1801–1828.
    Google ScholarLocate open access versionFindings
  • N.G. Polson, J.G. Scott, and J. Windle. 2012. Bayesian inference for logistic models using Polya-Gamma latent variables. arXiv:1205.0310v1.
    Findings
  • R. Rifkin and A. Klautau. 2004. In defense of onevs-all classification. Journal of Machine Learning Research (JMLR), (5):101–141.
    Google ScholarLocate open access versionFindings
  • L. Rosasco, E. De Vito, A. Caponnetto, M. Piana, and A. Verri. 2004. Are loss functions all the same? Neural Computation, (16):1063–1076.
    Google ScholarLocate open access versionFindings
  • A. Smola and S. Narayanamurthy. 2010. An architecture for parallel topic models. Very Large Data Base (VLDB), 3(1-2):703–710.
    Google ScholarLocate open access versionFindings
  • M.A. Tanner and W.-H. Wong. 1987. The calculation of posterior distributions by data augmentation. Journal of the Americal Statistical Association (JASA), 82(398):528–540.
    Google ScholarLocate open access versionFindings
  • D. van Dyk and X. Meng. 2001. The art of data augmentation. Journal of Computational and Graphical Statistics (JCGS), 10(1):1–50.
    Google ScholarLocate open access versionFindings
  • C. Wang, D.M. Blei, and Li F.F. 2009. Simultaneous image classification and annotation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    Google ScholarLocate open access versionFindings
  • J. Zhu, N. Chen, and E.P. Xing. 2011. Infinite latent SVM for classification and multi-task learning. In Advances in Neural Information Processing Systems (NIPS), pages 1620–1628.
    Google ScholarLocate open access versionFindings
  • J. Zhu, A. Ahmed, and E.P. Xing. 2012. MedLDA: maximum margin supervised topic models. Journal of Machine Learning Research (JMLR), (13):2237– 2278.
    Google ScholarLocate open access versionFindings
  • J. Zhu, N. Chen, H. Perkins, and B. Zhang. 2013a. Gibbs max-margin topic models with fast sampling algorithms. In International Conference on Machine Learning (ICML).
    Google ScholarLocate open access versionFindings
  • J. Zhu, N. Chen, and E.P. Xing. 2013b. Bayesian inference with posterior regularization and applications to infinite latent svms. arXiv:1210.1766v2.
    Findings
Your rating :
0

 

Tags
Comments