Nonparametric Bayesian Topic Modelling with the Hierarchical Pitman-Yor Processes.

Int. J. Approx. Reasoning(2016)

引用 43|浏览56
暂无评分
摘要
The Dirichlet process and its extension, the Pitman-Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them to latent variable models for text analytics. In particular, we propose a general framework for designing these Bayesian models, which are called topic models in the computer science community. We then propose a specific nonparametric Bayesian topic model for modelling text from social media. We focus on tweets (posts on Twitter) in this article due to their ease of access. We find that our nonparametric model performs better than existing parametric models in both goodness of fit and real world applications. Pitman-Yor processes can be stacked hierarchically to form Bayesian models.A general modelling framework using hierarchical Pitman-Yor processes is proposed.Efficient inference algorithm is made possible by modularising the hierarchy.This framework is applied to Twitter for text and network modelling.
更多
查看译文
关键词
Bayesian nonparametric methods,Markov chain Monte Carlo,Topic models,Hierarchical Pitman–Yor processes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要