Clustering texts using feature similarity based AHC algorithm.
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS(2018)
摘要
This article proposes the modified AHC (Agglomerative Hierarchical Clustering) algorithm which considers the feature similarity and is applied to the text clustering. The words which are given as features for encoding texts into numerical vectors are semantic related entities, rather than independent ones, and the synergy effect between the word clustering and the text clustering is expected by combining both of them with each other. In this research, we define the similarity metric between numerical vectors considering the feature similarity, and modify the AHC algorithm by adopting the proposed similarity metric as the approach to the text clustering. The proposed AHC algorithm is empirically validated as the better approach in clustering texts in news articles and opinions. The significance of this research is to improve the clustering performance by utilizing the feature similarities.
更多查看译文
关键词
Feature value similarity,feature similarity,AHC algorithm,text clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络