A novel topic model for documents by incorporating semantic relations between words

Soft Computing(2019)

引用 5|浏览48
暂无评分
摘要
Topic models have been widely used to infer latent topics in text documents. However, the unsupervised topic models often result in incoherent topics, which always confused users in applications. Incorporating prior domain knowledge into topic models is an effective strategy to extract coherent and meaningful topics. In this paper, we go one step further to explore how different forms of prior semantic relations of words can be encoded into models to improve the performance of topic modeling process. We develop a novel topic model—called Mixed Word Correlation Knowledge-based Latent Dirichlet Allocation—to infer latent topics from text corpus. Specifically, the proposed model mines two forms of lexical semantic knowledge based on recent progress in word embedding, which can represent semantic information of words in a continuous vector space. To incorporate generated prior knowledge, a Mixed Markov Random Field is constructed over the latent topic layer to regularize the topic assignment of each word during the topic sampling process. Experimental results on two public benchmark datasets illustrate the superior performance of the proposed approach over several state-of-the-art baseline models.
更多
查看译文
关键词
Topic model,Must-links,Cannot-links,Word embeddings,Gibbs sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要