Beyond k-Means plus plus : Towards better cluster exploration with geometrical information

PATTERN RECOGNITION(2024)

引用 0|浏览8
暂无评分
摘要
Although k-means and its variants are known for their remarkable efficiency, they suffer from a strong dependence on the prior knowledge of K and the assumption of a circle-like pattern, which can result in the algorithms dividing the input space instead of discovering non-predetermined data patterns. Thus, we propose beyond k-means++ that infers and utilizes explicit clusters by emphasizing local geometrical information for better cluster exploration. To avoid the K dependence, a novel framework of iterative division and aggregation (IDA) over k-means++ is presented. It begins with any K >= 1, then increases and reduces K along with the procedure of clusters' division and aggregation, respectively. To break through the circle-like pattern limitation, we introduce a reasonability checking strategy (RCS) for cluster division. Given local geometrical information, RCS achieves arbitrary cluster shape support by rejecting edge patterns with distinguished convergence direction and merging adjacent clusters with pseudo-edge patterns. Furthermore, we design an edge shrinkage strategy (ESS). Taking edge patterns as the cluster prototype, it benefits accuracy by effectively avoiding representability reduction due to irregular distribution. To compensate for the loss of efficiency, a near maximin and random sampling algorithm is suggested for large-scale data with high dimensionality. Experimental results confirm that beyond k-means++ is featured by handling arbitrary cluster shapes with remarkable accuracy.
更多
查看译文
关键词
Cluster analysis,k-means plus plus,Support vector data description,Edge pattern,Division and aggregation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要