Linear Time Samplers For Supervised Topic Models Using Compositional Proposals

KDD '15: The 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Sydney NSW Australia August, 2015(2015)

引用 8|浏览47
暂无评分
摘要
Topic models are effective probabilistic tools for processing large collections of unstructured data. With the exponential growth of modern industrial data, and consequentially also with our ambition to explore much bigger models, there is a real pressing need to significantly scale up topic modeling algorithms, which has been taken up in lots of previous works, culminating in the recent fast Markov chain Monte Carlo sampling algorithms in [10, 22] for the unsupervised latent Dirichlet allocation (LDA) formulations.In this work we extend the recent sampling advances for unsupervised LDA models to supervised tasks. We focus on the Gibbs MedLDA model [26] that is able to simultaneously discover latent structures and make accurate predictions. By combining a set of sampling techniques we are able to reduce the O(K-3 + DK2 + D (N) over barK) complexity in [26] to O(DK D (N) over bar) when there are K topics and D documents with average length (N) over bar. To our best knowledge, this is the first linear time sampling algorithm for supervised topic models. Our algorithm requires minimal modifications to incorporate most loss functions in a variety of supervised tasks, and we observe in our experiments an order of magnitude speedup over the current state-of-the-art implementation, while achieving similar prediction performances.The open-source C++ implementation of the proposed algorithm is available at https://github.com/xunzheng/light_medlda.
更多
查看译文
关键词
Inference,MCMC,Topic Models,Large Margin Classification,Regression,Scale Mixtures
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要