Improved fast partitional clustering algorithm for text clustering

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS(2020)

引用 3|浏览4
暂无评分
摘要
Document clustering has become an important task for processing the big amount of textual information available on the Internet. On the other hand, k-means is the most widely used algorithm for clustering, mainly due to its simplicity and effectiveness. However, k-means becomes slow for large and high dimensional datasets, such as document collections. Recently the FPAC algorithm was proposed to mitigate this problem, but the improvement in the speed was reached at the cost of reducing the quality of the clustering results. For this reason, in this paper, we introduce an improved FPAC algorithm, which, according our experiments on different document collections, allows obtaining better clustering results than FPAC, without highly increasing the runtime.
更多
查看译文
关键词
Document clustering,large collection,high dimensionality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要