Topic model with incremental vocabulary based on Belief Propagation.

Knowledge-Based Systems(2019)

引用 2|浏览2
暂无评分
摘要
Most of the LDA algorithms make the same limiting assumption based on a fixed vocabulary. When these algorithms process data streams in real time, the non-existent words in the vocabulary are discounted. Unexpected words that appear in the streams are incapable to be processed, as the atoms in the Dirichlet distribution are fixed. In order to address the drawbacks as mentioned above, ivLDA with topic–word distribution stemming from the Dirichlet process that has infinite atoms instead of Dirichlet distribution is proposed. ivLDA involves an incremental vocabulary that enables the topic models to process data streams. Besides, two methods are presented to manage the indices of the words, namely, ivLDA-Perp and ivLDA-PMI. ivLDA-Perp is capable of achieving high accuracy and ivLDA-PMI is able to identify the most valuable words to represent the topic. As indicated by experiments, ivLDA-Perp and ivLDA-PMI can achieve superior performance to infvoc-LDA and other state-of-the-art algorithms with fixed vocabulary.
更多
查看译文
关键词
Topic model,Belief Propagation,Stick-breaking process,Online algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要