Scalable inference in max-margin topic models

KDD, pp. 964-972, 2013.

Cited by: 19|Views34
EI
Weibo:
Extensive results on large scale data sets demonstrate that Gibbs max-margin topic models can significantly improve the classification performance while require comparable time as the unsupervised topic models

Abstract:

Topic models have played a pivotal role in analyzing large collections of complex data. Besides discovering latent semantics, supervised topic models (STMs) can make predictions on unseen test data. By marrying with advanced learning techniques, the predictive strengths of STMs have been dramatically enhanced, such as max-margin supervise...More

Code:

Data:

Introduction
  • Topic models such as latent Dirichlet allocation (LDA) [5] have been successful in discovering the latent factors underlying observed data.
  • The latent topic representations can be used for many subsequent tasks, such as classification, clustering or merely as a tool to structurally browse the data.
  • To improve the predictive ability of topic models, people have been interested in learning supervised topic models (STMs) [4, 27] which can perform the two tasks of discovering latent topic structures and learning predictive models jointly
Highlights
  • Topic models such as latent Dirichlet allocation (LDA) [5] have been successful in discovering the latent factors underlying observed data
  • We present a highly scalable approach to building max-margin supervised topic models
  • To improve the predictive ability of topic models, people have been interested in learning supervised topic models (STMs) [4, 27] which can perform the two tasks of discovering latent topic structures and learning predictive models jointly
  • We have presented a highly scalable approach to building max-margin supervised topic models for large-scale multiclass and multi-label text categorization
  • Our Gibbs sampling algorithm builds on a novel formulation of multi-task Gibbs max-margin topic models as well as a data augmentation formulation
  • Extensive results on large scale data sets demonstrate that Gibbs max-margin topic models can significantly improve the classification performance while require comparable time as the unsupervised topic models
Methods
  • The authors run the experiments on a cluster with 20 nodes, where each node is equipped with two 6-core CPUs (2.93GHz). 5.1 Data Sets

    The authors present experiments on several public text categorization data sets, whose statistics are shown in Table 2.
  • RCV 703,863 100,551 288,062 103 multi-label setting as in [27] to build train/test partition and the vocabulary.
  • The Wiki data set is built from the large Wikipedia set used in the PASCAL LSHC challenge 2012, and each document has multiple labels.
  • The third data set is the Reuter’s Corpus Volume (RCV1-v2) [13], another standard benchmark7 of which each document has multiple labels.
  • To test the scalability of the method, the authors have partitioned the data set into training and testing sets with a ratio of 7 : 1
Results
  • The authors use the F-measure, a harmonic mean of precision and recall, to evaluate the performance.
  • The authors can see that Gibbs MedLDA dramatically improves the classification performance over the two-stage approach of LDA+SVM
Conclusion
  • The authors have presented a highly scalable approach to building max-margin supervised topic models for large-scale multiclass and multi-label text categorization.
  • The authors' Gibbs sampling algorithm builds on a novel formulation of multi-task Gibbs max-margin topic models as well as a data augmentation formulation.
  • Extensive results on large scale data sets demonstrate that Gibbs max-margin topic models can significantly improve the classification performance while require comparable time as the unsupervised topic models.
  • The data augmentation techniques are general and can be applied to improve the inference accuracy of other topic models or latent variable models in general, such as relational topic models [8] for network analysis and matrix factorization [24] for collaborative filtering
Summary
  • Introduction:

    Topic models such as latent Dirichlet allocation (LDA) [5] have been successful in discovering the latent factors underlying observed data.
  • The latent topic representations can be used for many subsequent tasks, such as classification, clustering or merely as a tool to structurally browse the data.
  • To improve the predictive ability of topic models, people have been interested in learning supervised topic models (STMs) [4, 27] which can perform the two tasks of discovering latent topic structures and learning predictive models jointly
  • Methods:

    The authors run the experiments on a cluster with 20 nodes, where each node is equipped with two 6-core CPUs (2.93GHz). 5.1 Data Sets

    The authors present experiments on several public text categorization data sets, whose statistics are shown in Table 2.
  • RCV 703,863 100,551 288,062 103 multi-label setting as in [27] to build train/test partition and the vocabulary.
  • The Wiki data set is built from the large Wikipedia set used in the PASCAL LSHC challenge 2012, and each document has multiple labels.
  • The third data set is the Reuter’s Corpus Volume (RCV1-v2) [13], another standard benchmark7 of which each document has multiple labels.
  • To test the scalability of the method, the authors have partitioned the data set into training and testing sets with a ratio of 7 : 1
  • Results:

    The authors use the F-measure, a harmonic mean of precision and recall, to evaluate the performance.
  • The authors can see that Gibbs MedLDA dramatically improves the classification performance over the two-stage approach of LDA+SVM
  • Conclusion:

    The authors have presented a highly scalable approach to building max-margin supervised topic models for large-scale multiclass and multi-label text categorization.
  • The authors' Gibbs sampling algorithm builds on a novel formulation of multi-task Gibbs max-margin topic models as well as a data augmentation formulation.
  • Extensive results on large scale data sets demonstrate that Gibbs max-margin topic models can significantly improve the classification performance while require comparable time as the unsupervised topic models.
  • The data augmentation techniques are general and can be applied to improve the inference accuracy of other topic models or latent variable models in general, such as relational topic models [8] for network analysis and matrix factorization [24] for collaborative filtering
Tables
  • Table1: The amount of time (seconds) taken by the step of sampling η and network communication on the Wiki data set. K–the number of topics; M –the number of machines; communication includes both reduce and broadcast time
  • Table2: Statistics of the data sets. N –the number of documents; V –the number of terms; and L–the number of categories
Download tables as Excel
Funding
  • This work is supported by National Key Foundation R&D Projects (No.s 2013CB329403, 2012CB316301), Tsinghua Initiative Scientific Research Program No.20121088071, and the 221 Basic Research Plan for Young Faculties at Tsinghua University
Study subjects and analysis
multi-label data sets: 2
5.3 Multi-label Classification. We now present the experiments of multi-task Gibbs MedLDA on the two multi-label data sets, where each task is a binary classifier to identify whether a document belongs to a particular category. We use the F-measure, a harmonic mean of precision and recall, to evaluate the performance

Reference
  • A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. Scalable inference in latent variable models. In International Conference on Web Search and Data Mining (WSDM), 2012.
    Google ScholarLocate open access versionFindings
  • R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research (JMLR), (6):1817–1853, 2005.
    Google ScholarLocate open access versionFindings
  • A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. In Advances in Neural Information Processing Systems (NIPS), 2007.
    Google ScholarLocate open access versionFindings
  • D. Blei and J. McAuliffe. Supervised topic models. In Advances in Neural Information Processing Systems (NIPS), pages 121–128, 2007.
    Google ScholarLocate open access versionFindings
  • D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993–1022, 2003.
    Google ScholarLocate open access versionFindings
  • O. Catoni. PAC-Bayesian supervised classification: The thermodynamics of statistical learning. Monograph series of the Institute of Mathematical Statistics, 2007.
    Google ScholarFindings
  • J. Chang and D. Blei. Relational topic models for document networks. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.
    Google ScholarLocate open access versionFindings
  • N. Chen, J. Zhu, F. Xia, and B. Zhang. Generalized relational topic models with data augmentation. In International Joint Conference on Artificial Intelligence (IJCAI), 2013.
    Google ScholarLocate open access versionFindings
  • L. Devroye. Non-uniform random variate generation. Springer-Verlag, 1986.
    Google ScholarFindings
  • P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. PAC-Bayesian learning of linear classifiers. In International Conference on Machine Learning (ICML), pages 353–360, 2009.
    Google ScholarLocate open access versionFindings
  • T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of National Academy of Science (PNAS), pages 5228–5235, 2004.
    Google ScholarLocate open access versionFindings
  • Q. Jiang, J. Zhu, M. Sun, and E. Xing. Monte Carlo methods for maximum margin supervised topic models. In Advances in Neural Information Processing Systems (NIPS), 2012.
    Google ScholarLocate open access versionFindings
  • D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research (JMLR), 5:361–397, Dec. 2004.
    Google ScholarLocate open access versionFindings
  • D. McAllester. PAC-Bayesian stochastic model selection. Machine Learning, 51:5–21, 2003.
    Google ScholarLocate open access versionFindings
  • J. Michael, W. Schucany, and R. Haas. Generating random variates using transformations with multiple roots. The American Statistician, 30(2):88–90, 1976.
    Google ScholarLocate open access versionFindings
  • D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. Journal of Machine Learning Research (JMLR), (10):1801–1828, 2009.
    Google ScholarLocate open access versionFindings
  • N. Polson and S. Scott. Data augmentation for support vector machines. Bayesian Analysis, 6(1):1–24, 2011.
    Google ScholarLocate open access versionFindings
  • R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of Machine Learning Research (JMLR), (5):101–141, 2004.
    Google ScholarLocate open access versionFindings
  • A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Very Large Data Base (VLDB), 3(1-2):703–710, 2010.
    Google ScholarLocate open access versionFindings
  • M. Tanner and W.-H. Wong. The calculation of posterior distributions by data augmentation. Journal of the Americal Statistical Association (JASA), 82(398):528–540, 1987.
    Google ScholarLocate open access versionFindings
  • G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. Data Mining and Knowledge Discovery Handbook, 2nd ed., pages 667–685, 2010.
    Google ScholarFindings
  • D. van Dyk and X. Meng. The art of data augmentation. Journal of Computational and Graphical Statistics (JCGS), 10(1):1–50, 2001.
    Google ScholarLocate open access versionFindings
  • Y. Wang and G. Mori. Max-margin latent Dirichlet allocation for image classification and annotation. In British Machine Vision Conference (BMVC), 2011.
    Google ScholarLocate open access versionFindings
  • M. Xu, J. Zhu, and B. Zhang. Fast max-margin matrix factorization with data augmentation. In International Conference on Machine Learning (ICML), 2013.
    Google ScholarLocate open access versionFindings
  • S. Yang, J. Bian, and H. Zha. Hybrid generative/discriminative learning for automatic image annotation. In Uncertainty in Artificial Intelligence (UAI), 2010.
    Google ScholarLocate open access versionFindings
  • L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In ACM SIGKDD, pages 937–946, 2009.
    Google ScholarLocate open access versionFindings
  • J. Zhu, A. Ahmed, and E. Xing. MedLDA: maximum margin supervised topic models. Journal of Machine Learning Research (JMLR), (13):2237–2278, 2012.
    Google ScholarLocate open access versionFindings
  • J. Zhu, N. Chen, H. Perkins, and B. Zhang. Gibbs max-margin topic models with fast sampling algorithms. In International Conference on Machine Learning (ICML), 2013.
    Google ScholarLocate open access versionFindings
  • J. Zhu, N. Chen, and E. Xing. Infinite latent SVM for classification and multi-task learning. In Advances in Neural Information Processing Systems (NIPS), pages 1620–1628, 2011.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments