A context-enhanced Dirichlet model for online clustering in short text streams.

Expert Syst. Appl.(2023)

引用 0|浏览10
暂无评分
摘要
Online clustering of short text streams has become significant due to the popularity of news and social media platforms. The objective of online clustering is to maintain active topics (clusters) by automatically detecting new topics and forgetting outdated ones. Most existing approaches exploit static and high dimensional semantic term representation of the text to enhance the clustering quality. While these approaches use inference procedures that depend on a fixed batch size to reduce the number of clusters related to a given topic and bring it closer to the actual number of topics. This paper proposes a non-parametric Dirichlet model with episodic inference (EINDM) to cluster the evolving short text stream by introducing a window-based low-dimensional semantic term representation which captures the contextual relationships between words. In addition, an episodic inference procedure is introduced to reduce the cluster sparsity in the model. Furthermore, a novel “word specificity” measure is proposed based on neighborhood terms for evolving contexts for individual terms. Extensive empirical evaluation demonstrates that EINDM yields the best performance, in terms of NMI, homogeneity, and cluster purity, compared to recent state-of-the-art clustering models.
更多
查看译文
关键词
Text stream, Probabilistic model, Topic evolution, Micro-clusters
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要