An Efficient Split-Merge Re-Start for the K-Means Algorithm

IEEE Transactions on Knowledge and Data Engineering（2022）

引用 29|浏览45

暂无评分

摘要

The K-means algorithm is one of the most popular clustering methods. However, it is a well-known fact that its performance, in terms of quality of the obtained solution and computational load, highly depends upon its initialization phase. For this reason, different initialization techniques have been developed throughout the years to enable its fast convergence to competitive solutions. In this sense, it is common practice to re-start the K-means algorithm several times via one of these techniques and keep the solution with the lowest error. Unfortunately, such a choice is still likely to be a poor approximation of the optimal set of centroids. In this article, we introduce a cheap Split-Merge step that can be used to re-start the K-means algorithm after reaching a fixed point. Under some settings, one can show that this approach reduces the error of the given fixed point without requiring any further iteration of the K-means algorithm. Moreover, experimental results show that this strategy is able to generate approximations with an associated error that is hard to reach for different multi-start methods, such as multi-start Forgy K-means, K-means++ and Hartigan K-means, while also computing a lower amount of distances than the previous algorithms.

查看译文

关键词

K-means,K-means plus,Hartigan K-means,clustering

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要