ITWF: A framework to apply term weighting schemes in topic model

Neurocomputing(2019)

引用 5|浏览75
暂无评分
摘要
Topic models like Latent Dirichlet Allocation (LDA) and its variants is a type of statistical model for discovering latent topics. However, as revealed by the previous research, some topics generated by LDA may be uninterpretable and semantically incoherent due to the occurrence of irrelevant words in these topics. To improve the semantic qualities of automatically discovered topics, we explore the distributional characteristics of words across topics to identify topic-indiscriminate words which are blamed for the low-quality topics. The main contribution of our research reported in this paper is that we develop a novel framework named Iterative Term Weighting Framework (ITWF) which can effectively identify and filter out topic-indiscriminate words from uncovered topics. In particular, the proposed framework first applies an entropy-based term weighting schemes and adopts a novel iterative method to identify topic-indiscriminate words. To the best of our knowledge, our research is among the very few successful work that aims to enhance both the semantic coherence and the interpretability of LDA-based topic modeling methods. The experimental results show that the proposed framework improves the effectiveness of LDA as well as its variants.
更多
查看译文
关键词
Topic model,Latent Dirichlet Allocation,Term weighting scheme,Knowledge acquisition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要