Efficient Computation of Multiple Density-Based Clustering Hierarchies.

IEEE Transactions on Knowledge and Data Engineering(2017)

引用 21|浏览60
暂无评分
摘要
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts, choosing a "good" value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts, one has to run HDBSCAN* for each value in the range independently, which is computationally inefficient. In this paper(1) we propose an efficient approach to compute all HDBSCAN* hierarchies for a range of mpts by replacing the graph used by HDBSCAN* with a much smaller graph that is guaranteed to contain the required information. Our experiments show that our approach can obtain, for example, over one hundred hierarchies for a cost equivalent to running HDBSCAN* about 2 times. In fact, this speedup tends to increase with the number of hierarchies to be computed.
更多
查看译文
关键词
Computational efficiency,Clustering algorithms,Proposals,Data visualization,Organizations,Optics,Euclidean distance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要