Efficient Computation of Multiple Density-Based Clustering Hierarchies.

Antonio Cavalcante Araujo Neto,Ricardo J G B Campello,Mario A Nascimento,Jorg Sander

IEEE Transactions on Knowledge and Data Engineering（2017）

引用 21|浏览60

暂无评分

摘要

HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts, choosing a "good" value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts, one has to run HDBSCAN* for each value in the range independently, which is computationally inefficient. In this paper(1) we propose an efficient approach to compute all HDBSCAN* hierarchies for a range of mpts by replacing the graph used by HDBSCAN* with a much smaller graph that is guaranteed to contain the required information. Our experiments show that our approach can obtain, for example, over one hundred hierarchies for a cost equivalent to running HDBSCAN* about 2 times. In fact, this speedup tends to increase with the number of hierarchies to be computed.

查看译文

关键词

Computational efficiency,Clustering algorithms,Proposals,Data visualization,Organizations,Optics,Euclidean distance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要