# Scalable inference in max-margin topic models

KDD, pp. 964-972, 2013.

EI

Weibo:

Abstract:

Topic models have played a pivotal role in analyzing large collections of complex data. Besides discovering latent semantics, supervised topic models (STMs) can make predictions on unseen test data. By marrying with advanced learning techniques, the predictive strengths of STMs have been dramatically enhanced, such as max-margin supervise...More

Code:

Data:

Introduction

- Topic models such as latent Dirichlet allocation (LDA) [5] have been successful in discovering the latent factors underlying observed data.
- The latent topic representations can be used for many subsequent tasks, such as classification, clustering or merely as a tool to structurally browse the data.
- To improve the predictive ability of topic models, people have been interested in learning supervised topic models (STMs) [4, 27] which can perform the two tasks of discovering latent topic structures and learning predictive models jointly

Highlights

- Topic models such as latent Dirichlet allocation (LDA) [5] have been successful in discovering the latent factors underlying observed data
- We present a highly scalable approach to building max-margin supervised topic models
- To improve the predictive ability of topic models, people have been interested in learning supervised topic models (STMs) [4, 27] which can perform the two tasks of discovering latent topic structures and learning predictive models jointly
- We have presented a highly scalable approach to building max-margin supervised topic models for large-scale multiclass and multi-label text categorization
- Our Gibbs sampling algorithm builds on a novel formulation of multi-task Gibbs max-margin topic models as well as a data augmentation formulation
- Extensive results on large scale data sets demonstrate that Gibbs max-margin topic models can significantly improve the classification performance while require comparable time as the unsupervised topic models

Methods

- The authors run the experiments on a cluster with 20 nodes, where each node is equipped with two 6-core CPUs (2.93GHz). 5.1 Data Sets

The authors present experiments on several public text categorization data sets, whose statistics are shown in Table 2. - RCV 703,863 100,551 288,062 103 multi-label setting as in [27] to build train/test partition and the vocabulary.
- The Wiki data set is built from the large Wikipedia set used in the PASCAL LSHC challenge 2012, and each document has multiple labels.
- The third data set is the Reuter’s Corpus Volume (RCV1-v2) [13], another standard benchmark7 of which each document has multiple labels.
- To test the scalability of the method, the authors have partitioned the data set into training and testing sets with a ratio of 7 : 1

Results

- The authors use the F-measure, a harmonic mean of precision and recall, to evaluate the performance.
- The authors can see that Gibbs MedLDA dramatically improves the classification performance over the two-stage approach of LDA+SVM

Conclusion

- The authors have presented a highly scalable approach to building max-margin supervised topic models for large-scale multiclass and multi-label text categorization.
- The authors' Gibbs sampling algorithm builds on a novel formulation of multi-task Gibbs max-margin topic models as well as a data augmentation formulation.
- Extensive results on large scale data sets demonstrate that Gibbs max-margin topic models can significantly improve the classification performance while require comparable time as the unsupervised topic models.
- The data augmentation techniques are general and can be applied to improve the inference accuracy of other topic models or latent variable models in general, such as relational topic models [8] for network analysis and matrix factorization [24] for collaborative filtering

Summary

## Introduction:

Topic models such as latent Dirichlet allocation (LDA) [5] have been successful in discovering the latent factors underlying observed data.- The latent topic representations can be used for many subsequent tasks, such as classification, clustering or merely as a tool to structurally browse the data.
- To improve the predictive ability of topic models, people have been interested in learning supervised topic models (STMs) [4, 27] which can perform the two tasks of discovering latent topic structures and learning predictive models jointly
## Methods:

The authors run the experiments on a cluster with 20 nodes, where each node is equipped with two 6-core CPUs (2.93GHz). 5.1 Data Sets

The authors present experiments on several public text categorization data sets, whose statistics are shown in Table 2.- RCV 703,863 100,551 288,062 103 multi-label setting as in [27] to build train/test partition and the vocabulary.
- The Wiki data set is built from the large Wikipedia set used in the PASCAL LSHC challenge 2012, and each document has multiple labels.
- The third data set is the Reuter’s Corpus Volume (RCV1-v2) [13], another standard benchmark7 of which each document has multiple labels.
- To test the scalability of the method, the authors have partitioned the data set into training and testing sets with a ratio of 7 : 1
## Results:

The authors use the F-measure, a harmonic mean of precision and recall, to evaluate the performance.- The authors can see that Gibbs MedLDA dramatically improves the classification performance over the two-stage approach of LDA+SVM
## Conclusion:

The authors have presented a highly scalable approach to building max-margin supervised topic models for large-scale multiclass and multi-label text categorization.- The authors' Gibbs sampling algorithm builds on a novel formulation of multi-task Gibbs max-margin topic models as well as a data augmentation formulation.
- Extensive results on large scale data sets demonstrate that Gibbs max-margin topic models can significantly improve the classification performance while require comparable time as the unsupervised topic models.
- The data augmentation techniques are general and can be applied to improve the inference accuracy of other topic models or latent variable models in general, such as relational topic models [8] for network analysis and matrix factorization [24] for collaborative filtering

- Table1: The amount of time (seconds) taken by the step of sampling η and network communication on the Wiki data set. K–the number of topics; M –the number of machines; communication includes both reduce and broadcast time
- Table2: Statistics of the data sets. N –the number of documents; V –the number of terms; and L–the number of categories

Funding

- This work is supported by National Key Foundation R&D Projects (No.s 2013CB329403, 2012CB316301), Tsinghua Initiative Scientific Research Program No.20121088071, and the 221 Basic Research Plan for Young Faculties at Tsinghua University

Study subjects and analysis

multi-label data sets: 2

5.3 Multi-label Classification. We now present the experiments of multi-task Gibbs MedLDA on the two multi-label data sets, where each task is a binary classifier to identify whether a document belongs to a particular category. We use the F-measure, a harmonic mean of precision and recall, to evaluate the performance

Reference

- A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. Scalable inference in latent variable models. In International Conference on Web Search and Data Mining (WSDM), 2012.
- R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research (JMLR), (6):1817–1853, 2005.
- A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. In Advances in Neural Information Processing Systems (NIPS), 2007.
- D. Blei and J. McAuliffe. Supervised topic models. In Advances in Neural Information Processing Systems (NIPS), pages 121–128, 2007.
- D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993–1022, 2003.
- O. Catoni. PAC-Bayesian supervised classification: The thermodynamics of statistical learning. Monograph series of the Institute of Mathematical Statistics, 2007.
- J. Chang and D. Blei. Relational topic models for document networks. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.
- N. Chen, J. Zhu, F. Xia, and B. Zhang. Generalized relational topic models with data augmentation. In International Joint Conference on Artificial Intelligence (IJCAI), 2013.
- L. Devroye. Non-uniform random variate generation. Springer-Verlag, 1986.
- P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. PAC-Bayesian learning of linear classifiers. In International Conference on Machine Learning (ICML), pages 353–360, 2009.
- T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of National Academy of Science (PNAS), pages 5228–5235, 2004.
- Q. Jiang, J. Zhu, M. Sun, and E. Xing. Monte Carlo methods for maximum margin supervised topic models. In Advances in Neural Information Processing Systems (NIPS), 2012.
- D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research (JMLR), 5:361–397, Dec. 2004.
- D. McAllester. PAC-Bayesian stochastic model selection. Machine Learning, 51:5–21, 2003.
- J. Michael, W. Schucany, and R. Haas. Generating random variates using transformations with multiple roots. The American Statistician, 30(2):88–90, 1976.
- D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. Journal of Machine Learning Research (JMLR), (10):1801–1828, 2009.
- N. Polson and S. Scott. Data augmentation for support vector machines. Bayesian Analysis, 6(1):1–24, 2011.
- R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of Machine Learning Research (JMLR), (5):101–141, 2004.
- A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Very Large Data Base (VLDB), 3(1-2):703–710, 2010.
- M. Tanner and W.-H. Wong. The calculation of posterior distributions by data augmentation. Journal of the Americal Statistical Association (JASA), 82(398):528–540, 1987.
- G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. Data Mining and Knowledge Discovery Handbook, 2nd ed., pages 667–685, 2010.
- D. van Dyk and X. Meng. The art of data augmentation. Journal of Computational and Graphical Statistics (JCGS), 10(1):1–50, 2001.
- Y. Wang and G. Mori. Max-margin latent Dirichlet allocation for image classification and annotation. In British Machine Vision Conference (BMVC), 2011.
- M. Xu, J. Zhu, and B. Zhang. Fast max-margin matrix factorization with data augmentation. In International Conference on Machine Learning (ICML), 2013.
- S. Yang, J. Bian, and H. Zha. Hybrid generative/discriminative learning for automatic image annotation. In Uncertainty in Artificial Intelligence (UAI), 2010.
- L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In ACM SIGKDD, pages 937–946, 2009.
- J. Zhu, A. Ahmed, and E. Xing. MedLDA: maximum margin supervised topic models. Journal of Machine Learning Research (JMLR), (13):2237–2278, 2012.
- J. Zhu, N. Chen, H. Perkins, and B. Zhang. Gibbs max-margin topic models with fast sampling algorithms. In International Conference on Machine Learning (ICML), 2013.
- J. Zhu, N. Chen, and E. Xing. Infinite latent SVM for classification and multi-task learning. In Advances in Neural Information Processing Systems (NIPS), pages 1620–1628, 2011.

Tags

Comments