Improved fast partitional clustering algorithm for text clustering
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS(2020)
摘要
Document clustering has become an important task for processing the big amount of textual information available on the Internet. On the other hand, k-means is the most widely used algorithm for clustering, mainly due to its simplicity and effectiveness. However, k-means becomes slow for large and high dimensional datasets, such as document collections. Recently the FPAC algorithm was proposed to mitigate this problem, but the improvement in the speed was reached at the cost of reducing the quality of the clustering results. For this reason, in this paper, we introduce an improved FPAC algorithm, which, according our experiments on different document collections, allows obtaining better clustering results than FPAC, without highly increasing the runtime.
更多查看译文
关键词
Document clustering,large collection,high dimensionality
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要