Reducing The Sampling Complexity Of Topic Models

KDD(2014)

引用 256|浏览909
暂无评分
摘要
Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately the generative model loses sparsity as the amount of data increases, requiring O(k) operations per word for k topics. In this paper we propose an algorithm which scales linearly with the number of actually instantiated topics k(d) in the document. For large document collections and in structured hierarchical models k(d) << k. This yields an order of magnitude speedup. Our method applies to a wide variety of statistical models such as PDP [16, 4] and HDP [19].At its core is the idea that dense, slowly changing distributions can be approximated efficiently by the combination of a Metropolis-Hastings step, use of sparsity, and amortized constant time sampling via Walker's alias method.
更多
查看译文
关键词
Sampling,Scalability,Topic Models,Alias Method
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要