A K-means Text Clustering Algorithm Based on Subject Feature Vector

JOURNAL OF WEB ENGINEERING(2021)

引用 1|浏览9
暂无评分
摘要
As one of the most popular clustering algorithms, k-means is easily influenced by initial points and the number of clusters, besides, the iterative class center calculated by the mean of all points in a cluster is one of the reasons influencing clustering performance. Representational initial points are selected in this paper according to the decision graph composed by local density and distance of each point. Then we propose an improved k-means text clustering algorithm, the iterative class center of the improved algorithm is composed by subject feature vector which can avoid the influence caused by noises. Experiments show that the initial points are selected successfully and the clustering results improve 3%, 5%, 2% and 7% respectively than traditional k-means clustering algorithm on four experimental corpuses of Fudan and Sougou.
更多
查看译文
关键词
k-means, initial points, decision graph, iterative class center, subject feature vector
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要